aOCOMEKT^ESOHE " 



ED' 142 563 



95 



TB 006 175 



&D.THOE ■ 
TITLE . 
INSTITUT 



Law, Alexaifder I.; Bronson, 
Proguam Evaluator's Guide. 
Califoi-nia State Dept. of Education, 



William H. 



ION 



\ 



Sacramento. 



SPONS AGENCY , > 
PUB DATE . . 
NOTE 

AVAILABLE PFOM 



EDES'' PRICE 
DESCRIPTORS 



Educational Testing Service, Princeton, 
Office of Education' (DHEW) , Washington, 
Mar 7 7 " . . ' ' . 

335p."; For related document, see ?M 006 
Educational Testing Service, Evaluation 



N.J. 
D. C. 

UC'7- 

Improvement 



IDENTIFIERS 



Program, Princeton, New .Jersey 08540 (1-24 copies, 
$12.00 ea., 25 gr more, $10. CO ea.) 

MF-SC.83 HC-$1!3.C7 Plus Postage. 

Data Coliectior.; Educational Objectives; .*Evaluation 
Methods; *Evaluators; *Guides; Information- 
Utilization; Inservice Education; Instructional 
flaterials; *M'- asurement Techniques; *Plar.ning; 
♦Program Evaluation; Sampling; Statistical Analysis; 
Test Construction :- Test Interpretation; Test Results; 
Test, Selection . ^ ' ' . 

California; *California Evaluatibn Improvement. 
Project 



ABSTRACT ■= N 

This guide pr-'esents detailed infor^tion concerning 

•the purposes and process of program evaluation, the r^^e of the 
^'Valuator, and the development of an evaluation plan orxdesign. 
Instruction is provided in selecting or developing asses^m^nt 
instruments, collecting and analyzing data, reporting evaluation 
results and applying the fi-ndings. The manual, which mcludesX^ 
learning pxercises, was developed under the California Evalua.tibn 
Improvement Project as a study gu^de for use. in inservice trainin^x, 
workshops for program evaluators, teachers, principals, curriculum 
specialists and qther individuals responsible -for school programs and 
those who aid educational ' administrators ascertain program 
effectiveness. (SVH) •• ' . ; . 



* Documents acquired 'by ERIC include many informal unpublished « 

* materials not" available from' other sources. E^IC makfes every efrort * 

* to obtain the best copy available. N^evertheless , Utems o£ marginal * 

* reproducibility are often encountered and this affects the qualify * 

* of the microfiche and h<ardcopy reproductions ERIC makes availab-le * 

* via the ERIC Document Reproduction Service (EDESj . EDRS is not . * 

* responsible for the quality of the original document. Reproductions * 

* supplied by EDRS are the^best t-hat can be made from the original. * 



rvi 



us OEP.ARTMENTOF HEALTH. 

EDUCATION 4 WELFARE » 
NATIONAL VNSTITl'TE OF 
EDUCATICK 

This document has i4EEN RF.PRO- 
our ED E;<AC, Tiv AS HeCEtvED TrOAA 
THE PE R^jON OR ORGANIZATION OHIGIN. 

at»ngit points of view or opinions 
Stated do not nfcessarilv repre* 

SEN T (jT/hMCl AL NAT ION AL JNSTITUTtOF 
EDUCATION POSITION DR POLICY 





he Evaluation Improvement Program 



erIc| 



■PERMIS^N TO c^FPPODUCE^frtTS COPV> 
PiGHTED MAiEPIA^^-nrtiEN GRANTED BV 

I' rO E^'C AND CwGANiZAMOnS -OPERATING 
UNDER AGREEMENTS WITH THE NATIONAL IN- 
.STITUTE Qf" EDUCATION FURTHER REPRO- 
DUCTION ouTSint The epic^ system pe- 

QUIRES PERMISSION OF THE COPYRIGHT 
"OWNER 



CALIFORNIA STATE DEPARTMENT OF EDUCATION ^ 
WILSON RILES, SUPERINTENDENT 



PROGRAl^EVALUATOR'S GUIDE ^ ^ . 

• Bimanual developed as part of the California Evaluation 
Improvement Project under the leadership of the California 
State Department! of Education, Wilson Riles, Superintendent 
of Public Instruction and Director of Education 

Alexander I. Law, Chief 
Office of Program Evaluation and Research 

. William H'. Bi;5nson, State Director 
r.olT'fornia Evaluation Improvement "Proj ect 



■ 



if 



The Evaluation Improvement Program 



ERIC 



\ 



\ 



The activity whi^h is the subject of this report was supported 
in whole or in part by, the U.S. Office of Educarion, Department 
of Health, Education apd Welfare, However, the opinions expressed 
herein. do not necessarily reflect the position or policy, of the 
U.S. Office of Education, ^and no official endorsement by the 
U.S. Office of Education. should be inferred. 



• Copyright (c) 1977 by the California State Department of Education, 

. ■ ■ ■ 0 - ^- . . 

All rights reserved. 



. First Edition, Marc^977 
Published by Educational Testing Service, Princeton, New Jersey 



^ Educational Testing Service, a nonprofit organizai.ion, is 
an Equal. Opportunity Employer o 



FOREWORD 



Evaluation of schoo^l programs is becoming" mo re of a necessity for survival 
than a luxury enjoyed only by affluent districts. As financial resourcesf 
diminish, decisions on how to allocate the available funds must be made. ^ 
While basic educational research provides much valuable information, that 
information is usualljr not the kind on which day-to-day .decisions^about 
specific educational programs *are based* ProgriSim evaluation, as lierceived by 
'the California 'Evaluation Improvement Project, is a means by which us'^.ful* ^ 
information is collected and analyzed by a local educational agency for its 
own «use, • * ■ ' 

Wlvile most educators ha-'c h: d courses in testing and measurement and some 
contact with educational' research, there has been little in their training to 
'prepare them for conducting a systematic evaluation of a id.^.il school or 
classroom program.- Of course, evaluation has-been going on '.or- many years-, 
but it has 'most frequently been at the intuitive level, with little cons.istency 
and H'ttle impact on the to'tal educational program. 

. • ■ <r 

California's response to this problem has been to develop a training program 
in basic evaluation concepts and^Viils, which i^ directed to the classroom- 
teacher., the principal, -the curriculum director, or program man.?.ger who wants 
to evaluate a local program to assist in local decision making. 

One of trhe strengths 61 this training program la that it was developed and 
field tested throughout California by a group of educators whose backgrounds 
were iJrimarily in the are^fe of program planniag^ curriculum, administration,:! 
andr supervision. Evaluation specialists were used extensively as consultants 
as tjie workshop training program waj c'evelcour. but the emphasis has been kept 
on how evaluation infori;[iation could help in answering questions raised by the- . 
develope'rs, A^hose orientation was bajicalxy that of program managers. 

There is no magic formula to solve the problems involved in educating the 
youth of America; bu,t*' I hope that this training^ program in basic "evaluation 
concepts and skills will be useiul to local schools and districts as they 
work toward improvement of the educational process. 




WILSON RIIiES 
Superintendent of Public 
Instruction 



\ 



PREFACE 



* 4 



Progra& evalpation, throu^ which a school or diVtrict evaluates Its own 

program^f or its own"purposesV is dif f«'rentTfom "edvicationai research, "^t ^is ^ 
also different from a state testing program ii'nd fro© gathering information'' ^ 
required by' the statue about achievement levels^in specially funded programs. 
Program evaluation at the school or district leyel shpuid 'be, something the 
school or district does for itself 'for its own purposes, rather than something 
an outsi(Jer does for it.. ^ . \ 

Program evaluation should be an intergral part of the. prdgram-planning process . 
Prpvisions should be built into^each program to collect information that will 
indicate progress towards the program's . objectives, the degree of implementa- 
tion of the plan, and -other information required to make rational decisions 
about the* prograiti. . • - .. 

Program evaluation is of little value unless: some use is made of its ^result's. 
.A part of the. evaluation process includes identifying^ potential audiences',? or 
the evaluation report and finding out what kinds of information would be , 
useful to th6m. Providing useful, timely information .t6 people who can use it 
is one of the best ways of ensuring that the evaluation reports will be ^ 
used. ^ ' ^ o - ^ 

These concepts are basic to the workshop materials that have been developed by 
the Galiforni'a Evaluation Improvement Project.. The materials were designed to 
bo as practical as possible for the educational practitioner, «and it is our 
hcpe that the reader \ill find these concepts useful and will be able to apply . 
theiP to future planning a9 well as to programs' that are currently^ in operation. 



7" 




Alexander 1. Law— 
Chief, Office of Program 
Evaluation and Researcb 



. William H. Bronspn ■ ^ 

State Director, ^California 
Evaluation Improvement Project 



INTRODUCTION TO THE EVALUATION IMPROVEMENT PROGRAM 

* ■ • ♦ • 

Educational Testing Service (ETS) is pleased to have been selected by * 
the California Department X)f Education as. publisher, under an exclusive 
license, of the galifornia Evaluation Improvement Project .(ElP)^materials . 
These coufrtvitute a course of instruction for individuals responsible .for 
qrhool pr ograms a nd for those' who help educational administrators ascertain 
pi;ogram effectiveness. At the time of initial publication, spring 1977 , the 
.materials in EIP consist of the following: . . - 

'3 Program Evaluator's- Guide . The Guide is a basic manual which provides 
in considerable detail background knowledge on the steps involved, in 
planning and carrying t)ut -a -program evaluat.'4.on . It is designed as. a c 
study guide -and' learning tool f. or use in inservice training workshops 
for program evaluators^. . 

• Workbook on Pro-gram Evaluation . The Workbook has two purpo'ses. It 
can be put to \xsh as' a learning- and instructional aid while one 
masters the procedures , ' techniques , anc'. methods of program evaluation. 

, Used this way, it helps the practitioner summarize and put into 

practice the subject ma^ttet presented in the Profrram Evaluator's ^ 
Guide' ; elt is best used, however, as a working notebook which the 
* trained program evaluator can use' for recording his/or her plans as 
the/ are^'made and for making notes on 'program and program evaluation 
activities apd events during. the course of the program ^ear. Used 
in this way, it helps -the program e^valuator keep complete records of 
the important information related to tihe program evaluatioti. It will 
probably be. most useful when an interim or end-of-year program 
evaluation report has to be prepared, for much of , the information 
needed at those critical times will already haye been made a matter 
of record in the Workbook. 

• Evaluation Trainer's Guide . This volume is a companiotyto the 
Ptoferam Evaluator's GuideT It supplies background and Gupporting 

" materials for use by instructors conducting program evaluation 
workshops. Graphic art is provided for visual aids in support ot a 
variety of subjects. ^, 



Publiciation early in 19,77 of the first of a continuing of feririg of .EIP 
materials is consistent with STS's long-term commitment ^ to help advance the ^ 
art of program evaluation in the elementary and secondary schools. The EIP 
materials are •expe.cted to go through a number of printings under the ETS 
imprimatur. Each successive printing will be a revised^ition. Here we ask 
the help and cooperation of the readership. , 



ERLC 



1 



viii " ' , 

As you, as a program evalpation practitioner, identify parts of any of these 
three. works that could benefit from refinement and further development, or as 
you think, of experiences that would serve to Illustrate points* made in ^any of 
the subject treatments, we hope that you will share your thoughts with' 
us. . ^ ' 

.We would like to pee the Evaluation Improvement Program subjected to its owi> 
program evaluation by those who use its materials. "We would hope the evalua- 
tion will' be formative, not Summative in nature, for it is our intention to 
cycle^^evaluative comment on each edition into sigitificant improvements in " 
■later ones. Present plans call for publication of the second edition, our 

.first revision, late in 1977, and constructively critical- comments frtjm practi 
tionefs can be turned into refinements in print in very short order, dpin 
with us to make 'the EIP materials, initially ^ell developed by the California 
Evaluatit)n Improvemeut Project,, even better as time goes on." The California 
Department of Education and Educatiianal Testing Service have^ joined in the 
common goal of making the EIP materials as practical ■ and useful as they can b 
made to • be. " . 





Jack R. Childress 
Vice-President 
Educational Testing Service 



Wesley W. Walton 
^^^^ Program Director 
Educational Testing Servic*: 



CONSULTANTS TO TH.E PROJECT 

The following consultants worked closely with the developers of thes 
materials and offered excellent suggestions which have been incorpor 
into the text:^ 

■ Marvin' C. Alkin, Ed.D., Director' ' y ' 

Center for the Study of Evaluation ' . 

University of California at Los Angeles ' . 

'Merlynn Bejgen, Ph.D. , TeacfTing and Research Assistant 
School of Education, Educational Psychology , . 

Stanford University, Stanford \ 

Prestort T. Bishop , Ed. D ^f. Consultant 

Division of Program , Evaluation, Research and Pupil Services 
Office of the -Los Angeles County S^iperii\tendent of Schools 
\ ■ * ' ^- 

.Antonio DePprcel, Ph.D., Senior Re'search Scientist - ; 
Behavioral Science and Technology Program 
Anleican Institute for Research, ]Palo Alto ^ . 



\ 

if 



Annalee Elman, Ph.D., Research Assist^ant 
School of KHiiration, Educational Psychology 
'Stanford University J- Stanford . 

J. Richard Harsh., M.A., Director , * 

.Los^An^eliBs Office 

Educational Testing Service • . ' . 

Roger A. Kaufman, Ph.D . , Prof essor 
United States International University 

Larry E. Orcutt/Ph.D.. , Independent Consultant 
L.E. Orcutt Associates, Incorporated* ^ ^ 

Da'le M. Russell, Ed. D. , -Consultant , 

Division of Program Evaluation Research .and Pupil Services 

# . ■ 

R. Tarry Shirts, Ph'.D Consul tan t }^ 
Simile II Corporation . ■ 

Les E. Shuck, -Ed. D. , Assistant Superintendent * • ^ 
Research and Development \^ 
Newport-Mesa Unified School Dl&trict ^ 

Daniel L. Stufflebeam, Ph.D., ^rpfessor- - 
Western Michigan University 

Arlene B. Tsnenbaum, Ph.D., Consultant , 
Educational ^Research Division .. .. 

Xerox Corporation, Palo Alto^ 



ADVISORY c6mMITTEE, DIRECTORS AND STAFF 



4 

' - ■ . 

The following educators\served as members of the California Evaluation 

Improvement Project $tate. Advisory Committee and provided invaluable direction 
and support to the project: ^ ' . . 

Advisory Committee .^ . . • . 

• ■ — .. ■.- ' ■ ' • . 

• Robert W. Babcock, Ed.D. , Director , Evaluation Improvement Center, 

' , rSo,uthern^c.t;ion, Office of the-Los Angeles CoiihUy -Superintendent oi. 
Schools ^* 

' * . celt ■ ■ 

i ^ ■" ■ " . . 

9 Raymond M. Langley, M. A. , Ass.is tant Superintendent , San Luis Obispo 
Counter Office of Education ^ • . :^ 

• ^Alexander I. Law, Ph. D*/,* Chief , Of f ice of Program Evaluation and. 
/Research, California State Depdrtmeat of Education 

' , • FlC^d^I. Marchus, Ed.D. , Sup.erintendent, Contra Costa County Offic^ of 

Education' - • ^ ' ^ 

' . • . Donald A. MacLean, Ph.D., Assistant Super intendenb. Orange County 

Office of Education ^ * - • * • 

" ■ • ■ • , " ' • . »« 

' - Oliver "gud" Neevlx, M.A., Assistant Superintendent, Shasla County 
Office of Education ^ , ' ' 

m Ifelsou C.. Price, Ed.D., Director, Evaluation Improvement Center, Northern 
Section,' San Mateo County Office of Education 

\ • William J. Zrchmeier, Ph.D. , Assistant Superintendent, Santa Cruz ' 
County Office of Education 

\ * *^ 

• Dqriald C. Ziehl, Ed.D ., Superintendent , La' Canada Unified School / _ 
* District 

■ State Department of Educati on ^ - ' " 

• viiliam H'. Bronson, M.A. , State Project Director 

N . ■ ■ . 

• Carolyn M. Fowle^ .Ea*D. , Consultant • • . 

Development Centers " , . • 

• Robert Babcock, Ed.D., Director, Los Angeles County Office of ' 

Education . ' r \ 

# ^ ' • ■ •? 

■ " John Plakos, M.S. a . ■ ' » 

■\ "I ■ ■ 1 

• Marie E. ^Plakos, Ed.D.. \ 



• Nelson C. Price, Ed.Di, Director, San Mateo ..County Office of Education 

• ' .* » 

• Patricia Evan^, >r.A. ^ , . ' , ^ 

• Carmen J, Finley^, Ph.D. , " ^ ... , v * 

• Arlen L. Kennedy, M.A. ^ ^ ^ f 

m Alice W. Rotzel, Ph^D. , ' - ' ' . ' . 

Satell-ite Cente rs ' * . ' V 

A- • . . 

• Dean M Dennett, ^I.a/, Director, 'ShastV'County Off ice. of Education • 

• Aniello L. Malvetti, M.A. , Director, Sacramento County Office of " 
Education . . ' '. . 

• Marti y./HaJtev, M.A. ' / 

•"...Phylli? L. McKinney, M.A., Director, Oraage County Office of Education" 

• ' John Smith, M*A. • . 

• Jeff rey A.. Wells, M.A. ' ^ 

• Giea N. Piersoft, Ph.D. Director , -San Diego County Office of Education 

'* • " • " *j ' ■ . . • ■ 

• Rodney E. Phillis, Ph'.D. ' V 

• Thomas ^ Riley, Ph.D^v Director,., Fresno County Office of Education 

• . Jack M. Thompson, Ed.D ., Director > Sonoip.a County Office of ^ Education 

Vincent "Vic" Abata, M.A.- - 

• Gr^^^ory A. Malon'fe, M.A. ^ - , - , 

• " ■ . , .' ■ ' ' . ■ ■■■■■■ \ 



ACKNOWLEDGEMENTS 



The initial materials were developed by the following EIP staff merabers: 



• 



Rob*ert W. Babcock, Ed.D. , Director, Los Angeles^ Evaluation Improvement 
Project Development Center 

Patricia Evans, M.A., Consultant, San Mateo Evaluation Improvement 
Project Development Cent'er ^ * » . . - 

Carmen J. Finley, Ph.D. , Consultant, San Mateo Evaluation . ImpLovement 
/ , Project Development Center-- 

• Arlen L. Kennedy, M. A. /^Consultant, San Mateo Evaluation Improvement 
Project Development Center V . ^ 

. • John Flakos, M.S.,\ConsultantV-I'Os Angeles Evaluation Improvement 
Proj ect Development ^Cpnter . . ' ^ 

• Marie E. Plakos, ^ Ed.D.\ Consultant, Los ^Angeles Evalu^.tion ImprovemeTit 
-Project Development Center ^ ^ 

• Nels Price, Ed.D. , Director, ^an Mateo Evaluation Improvement 
Pra^.^ict Developm.int Center . ' • ' 

• Aiice W. .Rotzel, Ph.D., Consultant, San^Mateo Evaluatioa Improvemerft 
Project Development Center^ 

At the end of the* first and second years of development, revisions were made 
by Carmen J. Firiley, Carolyn M. Vowle, and' William. If. Bronsoa. . l^ef inements ' 
were^ based' upon extensive intox.'views witb members, of the ElP-staff throughout 
the state who had been using the EIP materials, in the conduct of workshops; 

Development activities were coordinate^d by William H.. ^ronson, M.A. , EIP 
Project Director and Carolyn M. Fowl^., Ed.D., Project Consultant. 

Prepublication revisions were ma'd^-'by Wesley W. Walton, Ed.D.,, Director of;^ the 
Evaluation Improvement Program ETS. Nathaniel H. flartshorne and Eetelle 
. Bartels served asi, ETS editors. Joaa Westoff and Terry Birch provided covers 
and art svpe'rvisioa. Marissa.Q. Burch and Cathy E. Snyder served as text- • 
processing machine operators 



IV 



INFORMATION ABOUT EIP MATERI.^LS AND WORKSHOES 



Information about ordering Evaluation Improve,iSfent Program materials, about 
Evalua^tion Improvement Program workshops that use these matei^ialis or about 
making arrangements for specially scheduled EIP workshops for local, regional,, 
or state ins^rvice training programs may be-obtained by writing or telephoning 
the Evaluation Improvement Program at Educa^tional Testing Service, Room P-069, 
Princeton, NJ ' 085A0, (609) 921-9000 or at any of its* regional offices listed 
below. 



REGIONAL OFFICES OF EDUCATIONAL TESTING SERVICE 



3445 Peaclftrefe^R^oad, NE 
Suite 1040 

Atlanta, Georgia 30326 
(404.) 262-7634 

3724 Jefferson, Suite 'lOO 
Austin, Texas 7ti7 31 
(512) 452-8817 

1947, Center Street 
Berkeley, California .94704' 
. (415) 849-0950 



960 Grove Street 
Evanston, Illinois 60201 
(312) 869-7700 

GPO' Box 1-2.71 V . 

San Juan, Puerto Rico 00936- 
(809) 763-3636, 3640, or 3760 

One Dupont Circle 
Suite ^10 

Washingtonr D.t. 20036 
(202)' 296-5930 - 



2200 Me r ton Aveni^ 
Rooro 216 

Los Apgeles,. California 90041 
(213) 2.54-.52J6 ^ • ' 



2 Sun Life Executivxj Park 
lOCPWorcester Rt)ad ■ ■ 

Wellesj/ey Hills, Massachusetts 02181 
(617) 235'-8861 AT 8860 



& 



/ 



ERIC 



CONTENTS 



Section A: "DETERMINE THE EVALUATION PURPOSES AND REQUIREMENTS 
Section B: DEVELOP AN EVALUATION PLAN 



Section C: DETERMINE THE EVALUATION DESIGN AND DO THE SAMPLING 

* ' "r . 

Section D: SELECT OR DEVELOP ASSESSMENT INSTRUMENTS , 

Section E: COLLECT THE DATA 

Section F: ANALYZE EVAtUATION DATA ^ 

Section G: REPORT EVAUJATION RESULTS 

^ection H: APPLY EVALUATION FINDINGS ; 

Section I-;- SELECTED BIBLIOGRAPHY 

Section J: APPENDICES ^ 



PROGRAM EVALUATOR'S GUIDE 



Section A 

DETERMINE TBE EVALUATION PURPOSES 
AND REQUIREMENTS 



PRECIS \ 



The success o£ a program and of its evaluation depends to a .great extent -upon 
how clearly the evaluator understands :^t the start what things should be like 
at Xhe end. If the schools' decision . makers are to have CQnJL idt^-'Cf in aa . 
evaluator's answers to policy questions about program effectiveness, costs, . 
and continuance, a number of ques^tions must be asked at 'the outset' and their , 
. a nswer s _ c la r fied through the ev ^ua t i^n pr ocess . It is cri t ical, t h en, to 



dec^tde early in the evaluation process what purposes che program evaluation, 

is expected to serve and who will b'e involved in defining them^, 

• " . • * 

In many programs that aire continued from year to year, such early 
planning^ consists of surveying the outcomes .of previous years' activities 
and determining status and needs In the , areas served by the programs. The 
evaluator would then ascertain what goals and objectives have been set for 
the program and what these mean in terms of program' evaluation. The link ^. 
between program planning and evaluation, planning is in the formulation of 
program objectives into terms trtat are measureable and with respect to, which 
adequate measurement information can be collected and analyzed to satisfy ^ 
end-of^year evaluation requirements. 



CONTENTS 



PURPOSES OF PROGRAM EVALUATION . . . ... . . . . . . . . • • A-1 

Communicating with the Public . , t • • ^""^ 

.Providing Information to Decision Makers ^. .. . . . A-3 

Improving Existing Programs A-4 

Providing Addltioaai Satisfaction to Participants ...... A- 4 

OVERVIEW OF tHE EVALUATION PROCESS , . . . - . . . A-5 

Definition of Program Evaluation ^. .... . .A--5 

Types of Evaluati^on Data A-6 

~ ' '\ ~ €v'3^ an Ongoing^ Process . . '. " * . . . A-9 

ROLE Ol^ THE PROGRAiM EVALUATOR A-ll 

External. Evaluator . ^ • : 

' Internal^ Evaluator ... . . • • • • • • • • • • • • •,• •. '^ ,• .^'"12 

INITIAL STEPS .IN EVALUATION PLANNING ....... A-13 

Find Out Wh'at the Program Evaluation "Is to Accomplish . \ . . A-13 ^ 
Review Nee.ds Assessment, Program Goals and Objectives^. . .' • • 

Separiite Objectives Statements from Goals St.atements . . , . "A^IS 
Determine That Six- Components of Performance Objectives 

Are Present ............... ^ • ^^^^ 

Summary of Evaluation Planning Stages A-16 

REQUIREMENTS' OF. PROGRAM EVALUATION . . . . . , A-lt) 

Key Questions A-16 

\^End-of-Year Evaluation ....... A-20 

bnterim Evaluation . \ .... 4 ....... \ . • A-20 

Identifying Resources and Constraints A-20 

• \ 

SUMMA"'.Y . : ^ • • • • ^^22 

CHECKLIST OF STEPS IN DETERMINING PURPOSES AND REQUIREMENTS j . , . A-23 

LEARNING EXERCISE TYPEC OF EVALUATION DATA A-25 

LEARNING M'XERCISE 2: IDENTIFICATION. OF MEASURABLE^OBJECTIVl-S • • \ A-27 

SELECTING APPROPRIATE OBJECTIVES . . . . . A-29 



LEARNING^ EXERCISE 3 
LEARNING EXERCISE 4 



MATCHING NEEDS STATEMENTS TO PROGRAM. 



OBJECTIVES AI4D PROGRAM ACTIVITIES A-39 



fr 



1 / 



1. 'PURPOSES OF PROGRA^l EVALUATION, 



In secent years, the pressure on public schools to evaluate and publicize 
che results of their educational program© has markedly increased. Response 
to this pressure has^^ ranged -from enthusiastic compliance to delay and 
avoidance. Frequently, evaluation ha^ been envisioned as producing more 
ris^s than gains.. Indeed, educators have asked: » Is a more thorough and 
improved evaluation worth the' effort? , ^ . ' ' 

Evaluation means different things to different people. ' Perceptions 
may be limited to indivi dual, activities such as grading students, rating 

~t^a^he^r-s-,-HBxami-ning---test- scor^ 

educational activity., „ . 

The primary emphasis of the Evaluaition Improvement Program is on the 
evaluation of educational programs. Progr,ams , ^'in thjls context, are- defined 
as a tombinatioh of content, personnel, activities, and resources organized 
so as to attain specified goals and .objectives . A program niay be specific 
to an age or grade level, a subject-matter discipline,- or a ty)pe of service. 

Program eva^lilation can serve dif^ferent pyrposes. Four ynajor ones, ^which 
will be discussed in th Ls Section, are: ^ 

1. Communicating with the publJLc 

,2. Providing information to decision makers 

3. ' Improving an existing program j . " 

4. Providing additional satisfaction to participants 

The reader may perceive additional purposes for program evaluation as he 
applies the concept to his own work setting. 

r 

Communicating with the Public 

Schools , play; to a number of audiences eacli ojf- which makes evaluative judg- 
ments. ThesB judgments usually are based on limited or partial information. 
Frequently, a community uses superficial newspaper and/or television repacks 



as the basis for evaluating the effectiveness of school programs. Notq the 
following report of reading scores as excerpted from an article published 
by the Los Angeles Times of De<:ember 3, 1974r 





READING 


SCORES — 


GRADE 6 






California 


State Testing Program 






'70-'71 . 


V'71-'72 


'72-'73 


.'73^' 74 


District 


Median - 


Median 


Median 


' Median 




%ile ■ 


%ile 


%ile 


%ile 


' A 


98 ■ 


89 


14 


6- ' 








10 


; ■ . 7 


c 


97. 


^ 99' 


97' 


^ 96 



Note^ the median perqientile scores of three individual districts. Most 
likely the diminished reading scores as reporteci in Districts- A. and B in 
1973-74 caused considerable discontent with the schools on the part of the 

citi^ens/of. these communities..,^ 

'■ ■ ■ , » •. " 

The public also receives information arbout school programs from ptud.ents 

in the family and from other informal settings. This information may or may . 

^npt be biased; however, judgments are nevertheless made based on iiifarmation 

gleaned from, such soulrces^ ^ 

f In summary, the public frequently -derives ,.it.^ opinion of the efficacy 

ot the educational system through partial, ot;^ at times , biased inf ormat'ion. 
These judgments affect the extent of financial support for sehoolSy the 
degree of freedom of instruction and the self-esteem of educators. A» a 
consequence, the opportunities available to learners may be pcf&itively or 
negatively influenced. 

It is, therefore, , beneficial to educators, to the schools and their 

programs, to supply the public with' comprehensive information, the beet 

that; can be pulled together. Reports to the public should bje based on a 

full range' of program objectives and should show the extent to'which the 

objectives were realized. When this is accomplished, the public ^will be 

able to make more informed jijdgments about the ef f'-ectiven^^ss of school 

. ■ .■ * * ' • 

programs and what ' is needed to gain-support for them. 



One must remember that within the public there are a. number of audiences, 
and each, has unique neefls for information.* You shduld identify these various- 
audiences and ascertain the questions they may raise about current educational 
programs. The audiences and their questions each need to be addessed in the 
program evaluation. Within the general public, one might identify CJ) parents 
(2) teachers, (3) students, (4) the business community, (5) thfe industrial 
community, (6) the professional community, and (7) the retirement community as 
somewhat separate audiences. , 



vft„Jivf^xrmat4xii^ to De cis ion Ma kexs- 



Judgments, made by school personnel are often critical and apt to have .an 
immediate impact. Program evaluation, ' then, can be helpful* in making ongoing^ 
decisions. The ' information it produces may be applicable through all 
phases of educational management ranging from assessment needs through progtam 
planning and implementation to the adjustment of objectives before repeating a 

program. . - , 

t ■ ^ ^ 

Educators often approach educatiopal planning With nothing more ttian ai\ 
Intuitive sense of needs. They may proceed without validating these needs in 
'the local setting. Likewise, many educators will have programs , objectives , 
and plans in mind without es'tablishing their appropriateness for filling the. 
needs which have been identified. To increase program effectiveness, educa- 
tors ne6d to asce^rtain vhat needs exist and /determine what programs will best 
meet those needs. ^ 

Planning' for program evaluation is an integral part of planning that- 
program. Such preparation can serve to assure a continuing focus on the 
most important objectives and steady progress towards their achievement. . 
Decisions with respect to a program and its p.arts to adopt, modify, expand, 
pr discontinue are made throughout the stages of its development. If. useful 
information is not available, arbitrary decisions will be pde. With evalua- 
tion information, the quality o£ decisions and acceptance of changes by those 
involved will be improved. Systematic evaluation provides a sound basis fpr 
the . decisions that are reached. • 



2 J 



Decision makers, of course, are to be found at various levels within 
the school. The teacher is a decision maker within the classroom, the 
{irincipal within the school, and the superintendent and board of education 
within the district. One must consider needs at each of these levels of 
decision making when gathering the needed 'information and developing the 
evaluation plan. . , 

Improving Existing Programs ■ : ' 

■ ^ 

An effective program evaluation system can help ongoing programs operate 
— mo^^e— e ff e ct ively by proyid ii^--§eedbt:tckr7:t-o— sira-^^ wha i lb hap pienlng"; 

Frequently, the instruments used to assess progran^ results also can be 
used to diagnose individual instructional needs. Lacking this information, 
the teacher's solution may tend towaW the same instruction for everyone. 
With it, the teacher can indivlduaj-ize instruction to meet each student's 
needs". Relevant program eyaluation- information may make possible a greater' 
degree of individualization of instruction and also more effective groupings 
of students for instructional purposes. ' ' 

Educational programs evolvfe and phange over time as students, and their 

needs also evolve and change. Information about the effe^zt of different 

' : ■ j ... 

aspects of a program on students may pnable the staff to identify the 

- j . » . 

factors which may need modification a^ the program proceeds. In a school 
system, there are persons at different levels who have access, to different 
resources ^nd who will take diff erent j actions in their attempts to improve _ 
ongoing programs.-* The teacher may adjust instructional method^; the 
principal may assign new personnel and/or resource materials; the board of 
education and superintendent may grant additional^ financial 'support . Each 
makes unique contributions and therefore has unique needs for evaluation 
V information. 

P rovj/SlpR Additional Satisfaction to Participants , , 

/ X ■ ' ■ . ■ ' . ■ . v 

Program, eva'lljation" diff ere from individual evaluation * in that it measures 

/ ' \ ' ' , 

pbjectives whicK" apply to groups of persons, pierhaps by grade level, 

/ - \ - . - 



/ 



/ 



ERIC 



academic department, or an entire school or a group of schools. Therefore, 
evaluation of programs can be conducted in a context of mutual help rather 
than posing individual threats, as. sometimes occurs in teacher evaluation, 
or. imposing too much testing, as Sometimes occurs . in student evaluation. 
Assessment of common objectives usually generates a sense of unity and 
growth. Program evaluation offers maximum benefits and minimum burdens 
for all in the schools. . 

Program evaluation may be designed to give useful information to* 
students as well as to' program managers. Especially if a student's ' 
progress as shown in a program evaluation is compared to his own previous 
performance, -he is likely, to see progress and-feel positive about his . 
growth* The beneficiaries in such a situation clearly are the students and 
the instructional staff. Questions they would like answered are important 
and* most assuredly should be built in as part of the evaluation plan; 



2. . OVERVIEW OF THE EVALUATfON PROCESS . " 

Definition of Program Evaluation * ; ^ *f - 

Program evaluation is defined here as the process of determining the value 
or effectiveness of an activity for the purpose of decision making. The 
key words in this definition are (1) value, (2) ef f eC:tiveness, and (3) 
decision making. ' 

Value . Whei) a program evaluation takes place, the 
decision maker is concerned with determining the 
net vaiue of something, its costs In relation to 
its benefits. Both costs and benefits have to be 
measured in terms* of human factors and dollars. 

2. Effectiveness • The decision maker needs to know to 
what extent a particular program was effective in 
meeting identified needs; or 'objectives . Measures of 
effectiveness tell the' decision maker what difference 
the program has made. 



3. Decision maklnR ^ A person with program re^sponsibilites 
needs information. on value and effectiveness which is 
useful in (leci-ding what to do next: to cTontinue, modify, 
' . or drop a program. The purpose of program evaluation-, is 

to improve the quality^of the: program decisiorrs reached. 

The evaluation process divides itself into thrqe major phases: 
(1) Planning, (2) Conducting, and* (3) 'Using. Each of the phases 
has distinct components. See the chart below: 

' ^ " ^ ' ' THE EVALUATION PRQCESS 

PLAN CONDUCT 

• Determine Evaluation . • Collect Data 
Purpose and Objectives 

• Develop the Geaeral • Analyze Data 
Evaluation Plan 

. • Determine the Specific -J 
Evaluation Design 

• Obtain Assessment 
Tools 

This Gui de is based on these eight components, with a section devoted 
to each. This section focuses on the first step, "Determine the Evaluation 
Purpose and Objectives.*' ; 

Types of Evaluation Data ; 

There are two basic types of program evaluation with which educators are 
concerned. They are fo'rmative and suminative. , . ' ' 

• Forftiative evaluation . takes place-during the development of a program^ 
or instructional unit. It is concerned with fine, tuning the imple- 
mentetion processes and measuring learner progress as the pi«ogram 
moves toward the attainment of specified objectives. Thus, formative 
evaluation provides the decision maker with information during the 



> • USE 

• Report Results 

• ""Apply Evaluation 

Findings ■, 



course of program development anJ execution for possible midcourse 
corrections to help assure that the program objectives are eventually 
'met in an effective and economical fashion. ' 

' m Summative evaluation takes place at the. end of a program or an 
■ ' ^■ 

instructional unit. This type of evaluation is concerned with 
measuring levels oZ learner achievement and the success (or failure) 
of operational procedures. 

The two types of- evaluation along, with three kinds of evaluation data which 
can be gathered for each, can be visualized at; follows: 

. TYPES OF EVALUATION DATA 



PRODUCT DATA 
(Learner Changes) 

PROCESS DATA 
(Supportive Activities) 

CONTEXT 'DATA " 
(Learning Environment) 



FORMATIVE 
(Interim) 


SUMMATIVE 
(End o-f 
Cycle) 















./ ■ - • 

Thus, formative and summative evaluation/indy include product, process, and 
context data, all three of which may be collected during a program cycle or 
at theend of a given program period. 

• Product data focus on the outcomes, results, or products of program 
activity. The purpose of collecting such inforination is to measure ^ 
and assess status' and accomplishments at the sta^t, during, and at 
the end of the program. SomeMmes postprogram follow-up is flso 
done. Product data should be related to established program goals 
and objectives. For example, an end-of-the-year summative evalua- 
^ tion of a pilot career education program for grades 11 an4 12^ with - 

the goal to develop job interview skills for students, might' show 
that 75 percent of the job-interview performa^iQe objectives were 
successfully accomplished by 75 percent of the students. 



Process data focus on the activities and procedtfres applied to the 
achievement -of the desired outcomes. The purpose of collecting such 
information vis to provide measurements and assessments which will 
help determine, the effectiveness of the various things doine in ^the" 
operation of a progtam. . Process data make .it possible to tnonitor > 
a'n activity or pr o^^ram to identify and/or predict , procedural diffi- 
culties, b.ef ore they loom large. For example, early, in the program, 
the decision maker may wish tjp know whether the, teachers and aides 



are-iropl-em en t ing ^t^e-progTain^'g'^nsrrro activities as agreed. 

' ^ Process data gathered with a formative pu'rpose can help kee^^ a 

program on track. Gathered with a , summative purpose, they- can help 
— in understanding what really happened in the program, after a ke^ . ' 
benchmark (i.e., end of year) has passed. 

• * " & " ^ " 

• Context data describe the environment in which , the pT^ogram act-fcvities 

/ are taking place. This might include facilities, equipment, supplies 
rules and policies', class organization,, teacher skills , and behaviors, 
attitude and support of the principal toward the progranv, discipline, 
and scheduling. ' ' 

Context daj:a are' useful in making judgments ijn whether "program 
objectives are feasible. They also serve to identify variables 
that may ke>ep the program from; meeting its performance objectives 
s^ch do u school principal whose attitude will impede a special 
program Imposed on his school. This would be a serious obstacle to 
the success of the program and woul<^ need to be addressed lest the 
program fail in a starved environfhent. 

An example of each type of information is presented below: 

* . - 

Pro'ducfdata : The students in* the experilnent al reading program have 
shown a mean gain of iO months for every 6 months of instruction. 

• Process data : The teachers andVthe aides have carried out all the 
enrichment program :actiyities as planned. * ' 

• Contex*" data : The textb( uks arrived/two months late resulting in a 
delhy in the im^picmt nt at Loi: .1 the Career EducatJon Program. 



Learning Exercise 1 oh pagi^- A'725 provides^help in uncTer^tanding types of 
. j' ' * ' • < • ♦ , . 

evaluation data.- '» - • . ■ 



Evaluation as an Ongoing; Process * . * ■ , 

Program evaluation is continuing and, ongbi^ng.- It occurs at the start, during 
and after a program has b^en run. One may consider evaluation as' the nucleds 
of the program, for it interacts vith the program's needs assessmejit , its 

Statement of ^cTals and ob jectives^ . ^.gj—PLt-Qg^^^ plann-inp^ . and-JjnpXement at-ioa.- — 

_ .» ' 

Evaluation data from the (Current run of a program b^ecoiffes needs data for 
the ^next run. The entire act,ivity is 'cyclic, as illustrated below:^ * 




The needs assessment gives direction for the deveicpn.cn t ot p'rog^ram go^.ls 
and objectives; prot^ram activities ^re' developed to accomplish the stated 
program objectives. Carrying them out produ-ces results of one sort o^r 
another. The evaluation process interacts with all these stages, and • 

suggests directions for new plans and actions. 

- * >. ■ ' ' 

'khe display above suggests that there Is conistant feedback and revision 
be.tween program evaluation- and each of the four areas. One quickly notes 
th^t prograir evaluation cannot exist as a separate entity; it must- be 
developed as an integral part of the program. Some of the relationships - 
are shown In the following table; • ■ * 



A-10 





* 








• 








p 


Evaluation ts an Integral Part of a Program. 




• 


Needs 
. Assessment \ 


Whi?t needs can. you cite that: justify 
the existence of this program? 


■ 






To what needs are the goals of the 






Program (ioals and 


program related? 




Wliat? 

% 


Objectives . ' 


^Of what goals are the objectives, of 








the program a part? t 








_ _ : , ^ 

What activities will most likely 








meet , the objectives? 




How? 


Program 
Activities 


How will you plan and carry out 
activities that will accomplish , 




«- • * 

eft « 




<t 

* tt:e objectives? 








What kinds of informatipn should be 




How Will' 


Program 


gathered to determine if the activities 




You Know? 


Evaluation ; 


are reaching the objectives a^id con- 








sequently meeting th^ needs? 








- 


* 


*• 


* » 


* ' . ■ 


V 

! • • 








ft 



ERIC ,■ /^.^ — - ; 



■3. ROLE OF THE PROGRAM EVALUATOR " ■ 

The role of the program evaluafor may be' perceived in a numb«ir of ways. 
One view is ^that of an .external person who is called in to assess and 
verify program success or failure and who will certify to a particular 
audience that* a particular program did or did not attain a specified degree 
of success* This person may also be seen as an objective and unbiased 
observer as well as one who may have new insights not readily apparent to 
those who have been close to the program. 

Another view is that of an internal person who is part of the program 
and who£io primary function is to work closely with program staff on evalua- 
tion matters. Together they gather information that can be used in improving 
the day-tc-day operation of the, program and in^ learning" at the end what 
happened. ' . 

While these two roles are not nece3sarily mufuaJly exclusive , the 
emphasis is sufficiently different that the kinds of information they gather 
and the reports they make are probably significantly dififerent. What one 
sees as the function of the ^valuator is directly related to what one sees 
as the purposes of the Evaluation. Most program?-, would benefit from an , 
evaluatbr whb i?i oriented to. neither view but is able to incorporate elements 
■ot both in his or her evaluation. Some advantage? and disadvantages of the 
internal and external models are listed below. 

External Evaluator 

* Advantages: • ^ 

1. Probably has more competence in program evaluation ^techniques. 

2. Brings the ^ectivity of an outside observer. 

3. ; Probably ^as no vestf::d interest in program outcome. 

4« Takes on the major part- of the evaluation burden from the 
.existing staff. 



Disadvantages: 

X. Will take longer to understand a program and the evalilation 
requirements. 

2. Lacks ongoing working relationship with program staff, 
school and discrict personnel • 

^3. Seen as an outsider by program staff. 

4. Time schedule of evaluator may not always niatch local 
needs"^ r • 

Internal Evaluator - 
. Advantages: 

1. Is apt to be more familiar with the total school setting. 
' 2., Has established working relationship with program staff .'\ 

3. Understands channels of communications within the school, 
the school district and the community. 

4. Is familiar with all details of the program. "' 
5... Has a personal^ interest in the success of the program. . 

Disadvantages : " 

1. May have a vested interest in program outcome. 

2. ; May reflect bias of program staff in the design and report. 

3. .May be overburdened by other duties and unable to devote 

adequate time to the program evaluation. 

4. May not have skills required in evaluation. 



4. INITIAL STEPg IN EVALUATION "-^LANNING 
Find Out What the f roRram Evaluation Is to Acc omplish 

V 

Program evaluation is ^frequently thought to be a sequence of activities 
%uch as choosing assessment instruments, collecting and analyzing the, data 
"obtained by the instruments, and reportifig the results. However, good 
program evaluation consists 'Ot much more than that. In developing an 
evaluation plan, one must ask: » ' ^ 

~~' What are the questions the program planner wishes to have 

answered? . , > ■ , - 

• What is expected' of . the program by the different publics, such as 
students, instructional staff, the principal, the citizens' advisory 

\ .committee, the superintendent, and the board of education? 

• What questions does. each of the groups served w^nt answered as a 
. result of the program evaluation? 

• What kind of information will each, audience accept .as evidence for 
answers to it;5' questions? / 

Only after addressing such questions can one begin to plan .program-evaluation 
strategiels. 

^ Review Nefeds Assessment, Program Goals and Qbiectives 



One of thii first things that an evaluator should do is to become well 
* acquaintec: with the program to be evaluated. If the evaluation has been part 
of program planning^ the evaluate robably will have been involved in plan- 
ning t:he rirogram from its inception. He or she has probably become familiar 
with learqer, educator, and community needs when these were identified. If 
the evalualtor is a more recent arrival, he or she has probably been thoroughly 
briefed abput the program. In situations" when evaluation.^has not been a 
central pa'rt of program plans, an evaluator fi:equently is not called in until 
th^ prograik has l^ieen developed and is about to be implemented. At that time, 
'-the evaluator may, be provided with a copy o-f the program document and asked 
to develop a design and plan for a program evaluation. 



^ ^If you as an evaluator are faced with this latter situation, be certain 
that yo.u a,re provided with the record of previous planning before beginning 
your task. - 

As an evaluate*., you will want' to leapn how objectives were set and 

how they. were seen to be telat^d to existing needs. It would be important 

to document whe,ther all who have an effect on pupi^l learning were giv/^n the 

opportunity to reconunend and tQ set priorities for objectives to be addressed 
t>y the instructions! program. ■ " ' 

The evaluator' s next .task would be to review the p-rogram goaJ s and 
ol;)jectives to satisfy himself or herself that they relate directly to 
identified needs and , to each other. Part of this task is to see that 
the objectives are stated in unambiguous terms dnd .that it is uniformly 
lind^erstood what results are expected from the program. This will help the 
program manager, *the program evaluator, and .the staff to reach agreement on 
the direction in which the program should be moving and what it ^should be 
accomplishing. ' ' . ' 

The general responsibility of writing objectives lies with the program' 
director; -however, if the objectives are not clear, it is the evaluator's 
role to se-ik clarification. At times it may be necessary for the evaluator 
to assist in the rewriting o'f the objectives or to suggest alternative 
statements . 

Clearly stated performance objectives are the key to the eyalua^tibh 
process. This is because the objectives set the stage for deciding the 
nature of the evaluation ;to be iperf ormed, thekind of design to be developed, 
the types 'of instruments to be. used, the resources required to perform ttje 
evaluation, and the Si:yle and^organization of the evaluation repcJrts. It. is 
imperative that the objectives are carefully formulated, that., they relate 
-recognizably to program goals and needs statements, and are framed .in ways 
that make them measurable. 



Separate Objectives Statements fr o pi Coals Statements , 

The aliility to set meahingful goals and oibjectives is a very valuable skill 

which i when- applied correctly, will help ensure succ<^ss in both the program 

and program evaluation. Goals and objectives can -help answer the questions, 

"What dp we want fo accomplish?" and "How will we know when we have accomp- 

lished it?" . A goal is usually a general statement of the long-term i^esults 

that one might hope td reach or coine close to reaching. An objective is 

developed to reflect a* specif 19 outcome of goal-related efforts and Li .. 

stated" in terms of measurable changes or improvements that are expected. 

The main differertce between the two is'^that an objective by its definition 

is measurable, wheteas a goal is seldom stated in measurable terms. \ 
. • . . . ■ . ■ . ■ ^ _ \ 

The following chart summarizes the differences between the two; ^ 

GOALS AND OBJECTIVES ARE 
NOT 'the SAME , 

Goals ' V ' Objective's ... 

m Broad ±n scope ' • Define intent 

0 General Statements • Define expected outcomes 

o^ aspiration in. nfea^iirable terms 

• Long-term or • Capable . cf being accomplished 

far-reaching within ao specified time frame 

Determine That Six Components of Performance Objectives Are Present 

A well-stated performance objective contains six'components that will answer 
the following questions: . • ' ' 

• Who? 

•Learns or does what? . t 

• Vlhen? " • * ' . ■ 

. . ^ . • Under what conditions? ' ■ ' 

^ ' ' • At what performance levels . ' 

.• ..• How will it* be measured? i?' 



16 • ■ • ^ ■ ^ . ■ • ' . 

The Wlic relatus to the person who performs an activity^ The -Learns or 
does what is the activity to be performed. When, Under what conditions and 
At what performance level relate to timie and performance conditions. How 
. will it be measured relates to assessment techniques. » ' 

■■ * ■ • ■ • ' . ; ■ • . • 

Summaiy of Evaluation Planning Stages " . ^ 

the program evaluator begins by reviewing the needs statements and formula- ' 
' tions of program goals and objectives. The second step is to review the 
program activities to determine how they are to meet the stated objectives.' 
If the activities seem not to match the objectives, the evaluator should 
recommend to the program manager that they be reviewed for possible revision, 
or that the objectives be jC^anged. The learning exercises at th^ end" of 

this section >are designed to-' reinforce the information in these pages. 

Learning Exercise Z (A-27 ). provides help in the identification of measurable 
objectives. Learning*" Exercise 3 (A-29) provides help in relating specific^ 
. objectives to given >^>goals and in rating their relative importance.. Learning, 
Exercise 4 (A-39) will be useful in understanding che^ relationships among^ ^ 
needs, objectives, and activities. 



5. ^^RECJUIREMENTS OF PROGRAM EVALUATION - : 

Key Questions . ^ . . " ■/ ■ 

The ultimate requirement of ^evaluation is to serve the needs of the 
audiences to which the program director is accountable. To make certain 
that both' interim or formative and end-of-the-year or summative evaluation , 
are efficient and provide the best possible information for program use, the 
evaluator should address ' the following questions' early in the evaluation 
planning s-'tage: ' . , 

■ 0 

0 ... 

, Who reqinres information ? Generally spesking,. everyone who has a 
responsibility for some phase of the program is a^^ecision maker ,.. and he 
cr she wij,l require evaluative information. The evaluator should identify 



1 



33 



A-17 



decisionmakers as early as possible and make personal contact with each to 
stress the desirability df wdrkinp closely together throughout the program 
period and to learn about their information requirements. 

What decision-making information is required ? During the program 
period, various staff members with program responsibilities will require 
information in areas of -~ concern to them.' Since responsibilities are not 
always clear-cut, the decision makers must tell the evaluator which program 
eleraeats are of interest and what types of information are needed for each. 
Unless the evaluator has this information, he or she may not be able to ^ 
^provide sufficiently" useful data- to decision makers. Ideally, to ensure 
tl)at information collected and analyzed will be as; meaningful as possible, 
the decision maker should formulate" questions related to program objectives . 
or concerns that the evaluator should address Questions submitted by , .. 
'decision makers then can. be translated intb. functional terms for inclusion 
in the evaluat ion desi^trrTtte"l[i^£'$-collec instruments, the data-analysis 

plan, and in the outline ^o^^ the evaluation report. ^ 

' when is the information required ? Information gathered needs to be 
. both instructive-aad^^timely. To assure that prog=ram-evaluation reports 
are submitted when required, a reporting timeline should be developed. 

On the following page is, a form on which information requirements for 
prograin evaluation may be recorded. The evaluator should complete this form 
as he or she meets. with each decision maker to determine his or her informa- 
.,tidn requirements. On page A-19i there is a sample Evaluation Information* 
Requirements Form that has been filled out. . " 



A-18 . • - • \ 

. Interim Report . ■.■ Prpgram - 

End-of-the-Year Report . Program Director 



District* 

EVALUATION INFORMATION REQUIREMENTS FORM 
1 . 2 . 3 r . A 



Who Requires 
Information? 


What Information Is - 
Required? ' 


Date * 
Required? 


Use to Be Made 
of , Information 




'■ • • ■ \ 








/' 








- ■■■ ■/ 








*L> ' 

. ■ ■ ■ ^ '■ 















ERIC 



A-19 



Interim Report 



Program .ERPHI 



End-of-^the-Year Report 



Program Director T.H. Collins 
District Rosedale 



EVALUATION^ INFORMATION REQUIREMENTS FORM 



Who Requires 
Information? 


What Information Is 
• Required? 


Date . ^ 
Required? 


Use to Be Made 
of Infprmation 


Dr. Marie 

Th'omson, 
'Principal 


Reports on observations 
of classroom activities 


Nov. 15 
Dec. 15 
Jan, 15 ' 


To determine it 
the instructional 
program has been 
implemented as 
planned 




Teach er-parapr of ess ional 
reactions to inservice 
training program 


Oct. 1 


To determine 
effectiveness of 
tne inservice 
training program 
for revising 


c ■; 






program if 
required 


\ . ' 


Student progress in 
r-ea ding and mathematics 
achievement 


Jan. 15 J. 


To determine 
whether students 
are achieving at 
the expected 
rates 


.. ^- ' ■ ' ■' 










: ' ^ 







ERIC 



End-o£-Year ETvaluaclon 

End-of-the-year , or summative, evaluation is critical co decision makers who 
must decide whether to continue, modify, expand, or discontinue the program. 
It also serves to identify needs to be addressed ,n planning the.instruc- * 
tional program for. the following ye^r. 

The end-of-the-year evaluation > should answer , explicitly those question? 
that were designed int-o the evaluat ion. p^n at the beginning; of the, year • 
It is important, therefore, that the. ev^uator review the plans for program 
evaluation prior to program implementation* ^lo be certain that all- the data 
which will be required willvbe available for the end-of-the-year evaluation. 

Interim Evaluation 

■ " " ■ ■ ■ ' . . I 

Interim,- or formative, evaluation allows decision makers to determine how 
well program objectives are being met while the program is ongoing and to 
decide what to do to improve program activities in progress. It is a 
viable tool for controlling and fine tuning the prograirii 

A Care-mqst be taken, to evaluate program activities asJWelJL as to mear-ure 

progress towards program objectives. With effective monitoring, deviations 

f rom^^planned activities can be id^intified immediately and corrected before. 

" ' t, ■ . . . / . 

they adversely affect program outcomes. * . 

Identifying Resources and Constraints 

Resources and constraints should be identified during the e^rly phase of ' 
evaluation planning. This is important, for it will bring to light the 
resources needed and those currently available, and will enable decisions 
to be made on whether to add resources where there are shortages or to get 
along under constraints. Some resources to consider include: 

■ . ■ ' 
•1. The amount of money budgeted for the evaluation 

2. The amount of personnel tima available for data 
collection and record keeping 



A-21 



3. 



4." 



The services available from bther agencies . 
(i .e. , district, , county, and/or state) 



The Instruments currently being- administered ^ 
for other purposes that can provide some of 
the pirogram-evaluation data 

Later-in the-planning period, when the evaluation plans and procedures 
have been determined, specif ic requirements will be ident^'fied. A match/, 
misma^tch between resQurces available and those required should be made. 
As discrepancies are identified^,Nthe program director and staf^f , with the - 
assistance of the evaluator, will be .,in a position to determine the manner 

in which the discrepancies can best beJliandled. • • ... 

, - ■ , • " . , ^ 

One method of resolving a constraining factor is to change the . require- 
ment. Another is to create new ways to meet it. At times, a^compromise 
may be reached with the decision maker as to how much he or she is willing 
to sacrifice in order to achieve a given feature of a program evaluation. 
There i:, also the alternative of eliminating the interim evaluation so that 
the available resources can be focused, upon the longer-range concerns of 
the endfof'-year evaluation... Constraints become evident as planning proceeds. 
When such problems come to light, the evaluator' s advice on alternative 
solutions is the key to balaxicing high-level i:esource requirements against 
ever-present constraints. ; . ^ 

If the program receives categorical funding ^from st'ate or -ferfera 
sources, note must be made of the reporting req'uireiments ^f these agencies. 
Such external . requirements should be combined with local requirements to 
' define the total evaluation requirement. Da^t^ for both purposes should be 
.collected and treated as X ""^^t. Duplication throughout' the evaluation 
effort, J>uch as double-data collection, should be avoided at all cost. - 



)4> 



ERIC . 



{5UM>lARY 



to ;cleterTit|.ne evaluation purposes and requirements*, the evaluator;. 

1. Reviews prpgram records of , outcomes of ' previous progr-oi 
implementation. 

^ 2. Determined how the learner, educator, and/or community needs 
were identified. ' . 

3.. Determines that program goals match identifi*ed needs. 

4. Determines that #performance objectives are compatible with 
program goals and needs statements.. 



5. Determines that p'erfoxmance objectives pre written in 
measurable terms. . ' 

6. Reviews program act ivities to be certain that *they- relat^-" 
to performance objectives. 

7. Determines end~6f-the-^year and interim evalua:tifon requiremeats . 

8.. Identifies preiijiinary evaluation resources and constraints. 

9.. Develops a composite list of re spifrce requirement's - and submits 
the list for staff concurreaee' and board ,Df education approval. 




checkljsir of the. steps in deterjiining 
[ajatijjn purposes and requirements 



DEFINE EVALUATION PURPOSE ■ ' 

• Determine from decision makers the purpos^ of 
the evaluation. The purpose will dictate the ^ 

. . « types of evaluation that' must be conducted. 

REVIEW NEEDS ASSESSMENT, PROGRAM GOALS AND PROGRAM 
• OBJECTIVES 

• Determine , whether a needs assessment was 
conduVted. the decision maV'- rs. 

• Review program goals to determine whether 

they addrVss^the, needs or problem areas. :. 

• -• Review performance bbjectives to determin'e 

ttiat they are compatible with progtam goals.*' 
' Are the objectives stated^ in unambiguous terms ?^ 

REVIEW PROGRAM ACTIVITIES 

• Review program activities 'to" judge whether 
they, can be expected to ^contribute to, achieve- 
ment of the objectives. * ; 

: , .v . . r>. 

• If the activities do. not match the' ob'jectives , 

• recommend that, activities or objectives be 

* revised^ ' ■ - 

IDENTIFY EVALUATION REQUIREMENTS ^ 

■« ■ . .J 

• Request that decision make^rs- identify the ^ 
info.rmation they will require., to make end-of- 
the-year decisions about the\ program. 

Petermin^ process, product, and context data 
that should be collected. 



Step 
Started 



V./ 



A-24 



CHECKLIST OF THE STEPS IN DETERMINING 
•EVALUATION PURPOSES AND REQUIREMENT^ 



Step . 
Started 



Date 

Completed 



-IDENTIFY EVALUA'i;iON REQUIREMENTS (cont'd) " 

• ^Determine the information required by decision 
makers to make Winter .ri decisions. 

• Determine when the information is required. 
•« ■ ' • 

• ■ ^ 

IDENTIFY EVALUATION RESOURCES AND CONSTRAINTS • 

• Determine the resources and constraints which 
will'' affject the conduct of the evaluation.. 

• Advise decision makers of those resources 
which are available and those that axe 

reqyired. 

/ 
/ 

• Submit recommendations to decision makers 
for reconciling discrepancies between 
resources. available and those required. 




LEARNING EXERCISE It TYPES OF EVALUATION DATA 



1 



Directions: Classify the , following 
examples by type of evaluation data 
to be collected. 



Types of Evaluatign Data 

Product — PT 
Process — PS 
Context — C 



Write the abbreviations for these types 
, ' in the spaces provided. 
Example I: The end-of-the-year evaluation indicated that 
three of the ^,f our program performance objectives were met. 

Example II: Two of the seven teachers developed their own 
math materials instead of using those prescribed for. the 
program. 

Example III: Af.ter the second week of school, all the 
teachers went on strike. 



Example IV: It was determined at "the end-of-the-y^ar 
evaluation that A5 of the A6 instructional activities 
were implemented as planned. 



A-26 



Learning Exercise 1 



AI^SWERS 



Example I: PRODUCT 



Explanation: W!" en we spe^k of performance "objectives , we 
are speaking of learner progress or outcomes. 



Example II: ^PROCESS . ^ 

Explanation: In this exainp]|.e, the evaluator \s looking at 
L4ie activities implcTnented to support^ learner, progress or 
outcome. " : ; 



E:. ample III: COMJEXT 

Explanation: The tochers' stri|ke is a condition which 
interrupted the instructional program design and which 
could havp. kep^^ the prograTi from i^ieeting its performance 
objectives. • , ' *. \, 

Example IV: PROCESS ^ ^ ' " ' 

■ Explanation: As in Example ll/^ the instructional, activities 
were designed to support the achievement of the desired 
pupil outcomes. 



EKLC 



Learning Exerdise. 2 



LEARNING EXERCISE 2: .IDENTIFICATION OF MEASURABLE OBJECTIVES 



The following are partial statements of performance objectives. 
Check whether or not thes- statements are written in terms that 
are* measurable • 



YES NO 



!• He will be able to understand the 

principles of citizenship. ^ . ' 

2. Each child will be abLe to name the 
days of" . the*»week in order beginning ,^ 
with .Sunday. 

3. The students will appreciate the : 
culture of their northern neighbor. 

^/ The students-will construct a log cabin. 

5. The teachers' will learn the significance , 
of the experience. 

6. Children will enjoy going to the library. 

r 

7. Parents will become aware of their need 
to participate iti the school program. 

8. Students; will grasp the eonce'pt of good 
citizenship. 

9. Each child will write a composition. 



10, ' The toaciuir's assistant, must know the 
t^iac^iing philosophy of Herd Start. 



.^Learning Exercise 2 
A-28 

T* Do the following statements contain the six. (6) components 

found in a performance objective? Be prepared to identify o 
any components that might be missing, 

' YES . NO 

11. All 2nd grade, students receiving 

^ remedial math' instruction will - ' 
show a gain of five months in math 
computations for every five months 
.of instruction. Gain will be • » ' 
measured by the state-approved, test. 

12. ^he students will show a six months' 
'growth in reading comprehension"~'as a 

result of the'remedial reading program. 



/ 



ERIC 



' . , '- ' : Learning Exercise 3 

' , . > A-29 

• LEARNING EXERCISE 3: " SELECTING APPROPRIATE OBJECTIVES 

. ■ •■ ■ • , ■ ■ • ' ^ . , 

This exercise is. designed to give you practice in deciding which objectives 
will best measure progress towards a specific goal. Four different goals 
are presented. They deal with students lacnievement ; motivation and commit- / 
ment of^students, staff, and community; individualized instruction;, and 
self concept. A list of 11- possible objecttves that could be used to^assess 
progress towards the goals is also given, c Your task is to take one of the four 
goal areas and decide which object£ves would be most appropriate for evaluating 
that program goal. 

. Instructions, for Each Panel of Consultants 

Suppose that you and other members of your group are consul taints selected by 
the school boar4 to form an advisory evaluation ;panel. The main task of the 
panel is to select a set of objectives appropriate to evaluate the goals 
stated on the "Reading Program Goals Sheet'* (A-30). Include only the most 
important objectives^ — those you think the district can reasonably afford to 
pursue and evaluate. 

' At the end of the panel's meeting, it is expected that the panelists 
will have: ^ ' , 

1. Read the "Brief Description of the Program" and 'selected 
one of t:hf> four progxatii goals (A-30). 

• 2. Selected from the list of .11 objectives (A-31-33) those ^ 
most of the members ag'ree are adequate for determining 
whether the goal'^ selected has been achieved. If the 
panel members are not satisfied 'with those listed, they 
should have developed their own set of objectives (A-34)* 

3. Individually rated each selected objective using the 
worksheet (A-36). - • 

4. Tallied the selected objectives according to the ratings 
assigned by the panel (A-37). , ^ 



L'earning Exercise 

• . READING PROGRAM GOALS SHEET" ► 

. . • ■ ^ ,. 

. Brief Description of the Program 

This program is ja reading performance contract project funded by the state. 
The:.: program is 'located in one of the junior high schools of an urban school 
district which has shown a great need for special reading instruction. The 
emphasis of the program is placed -on individ^ualized .reading instruction* 
Two main componentis are: (1) the diagnosis of reading needs of individual • 
students; and (2) the careful pl'^nning of reading instru'ttioh according to 
the diagnosis. , 

Teachers in the program have been given preservice training and will/ 
/receive* inservice training in the use of individualized instruction 
techniques • . ^ 

The program "is in its first year of operation with contracts between 
the school district and the teachers working in the program. 

^ * 

■ Program Goals 

Four of the goals of the program stated in the contract are as follows: 

I,, pafticipAting students will raise their reading achievement 

PERFORMANCE. ' ' ' 

II. ADMINISTRATIVE STAFF, TEACHING STAFF, STUDENTS, AND felBERS OF ; 
THE COMMUNITY WILL DEMONSTRATE THAT THEY WERE HIGHLY MOTIVATED- 
AMD HIGHLY COMMITTED TO A SUCCESSFUL IMPLEMliNTATION OF THE 
READING PROGRAM. 

Ill, . INDIVIDUALIZED INSTRUCTION TECHNIQUES WILL BE USED AS THE MAJOR 
TEACHING STRATEGY IN THE IMPLEMENTATION. OF THE INSTRUCTIONAL 
PART OF THE READING PROGRAM. 

IV. PARTICIPATING STUDENTS IN THE READING PROGRAM WILL RAISE THEIR 
SELF CONCEPTS. . 




■ > . . • . ^- ■ . Learning Exercise 3 

■ ^ ■•■ . . /" . • ^ ^ A-31 

' ■ , ■ ■ . .. • 

' ■ ■ /' 

'■ LIST OF READING PROGRAMOBJECTiVES ' v ' 

' ■ • . • .-^ ' ■ • ■ ■ ^ • 

The "planners of th6 program/ together with the-designated program evaluator, 

have developed the following Tist of program objeccives. These objectives 

are examples which may o'r may not be appropriate to evaluate one or tijore of 

the program g9als: _ ^ 

1. •All, .participating students with self concepts below 
the 30th percentile as measured on a standardized 
' ' inventory on a pretest will show a gain. of at least 

t^n percentile points towards positive self concept * 
as measured by the same, instrument at the end o.f 
the eighth month in the special reading program. 

2'. The' teaching staff will assess reading skills and 
design. individualized reading activities for each 
participating' student at the beginniiig and at 
one-month intervals during the special reading . 
program. The fulfillment of this objective will . . 
b^ measured by the extent of^ the entries in the 
* locally developed "Student Ac tivities Diary . " 

3, At the end of the eighth month of the special ^ 
reading program, at' least 95 percent of- tho teaching 
statf will have participated in three-four t'lis or 

' more. of the supplementary instructional activities 
(e.g., inservice sessions, staff meetings) desig- 
nated in the program. The fulfillment oi this 
objective will measured by the tallying of 
attendance in an attendance log. 

4. At the end of thfi eigh "h month of the special ^ ^ » 
reading program, at least 70 percent of the 

administrative JiLaff will have pa :^xolp;ji:od in . 
oiiG-i'oarth or rnon I'f the suppieinentatV u::^ivities 
(j.^.^., ^nservice suhisions., staff rnpetings) 




u 



Learning Exercise 3 



A-32 



" designated in the program plan. The fulfillment 
of this objective will be measured by the tallying 
of attendance in an attendance log. 

5. Fifty percent of the participating students will, 
show a gain of 15 percentile points or more in 
reading achievement as measured by a standardized, 
. :' norm-ref i^renced re.-^dirig achievement test, at the 

^^x^ end of the eighth month of the special reading . ' 

^program compared with the results of the same 
test administered at the beginning of the program. 

- 6.. At the end of the eighth month of the special 

reading program, 70 percent of the participating 
students will respond correctly to three-fourths 
or more of the questions on a criterion-referenced 
' ^ test. - - 

^ 7. At the* end of the eight-months period, the teaching 
staff will reflect a measured mean score of 4 or 
higher on a rating scale with a designated low=' score 
of 1 to a designated high score of 5 indicating the 
extent to which they werfe {>ersonally committed to 
the successful implementation of the special re^.ding 
program. . . , . ' 

•i 8. When responses are solicited at the' end of the eight- 
inon::hs period, participating students will show a 
measured mean score of 4 or higher^ on a taking scale 
with a designated low score of 1 to a designated hiph 
score of 5 indicating the extent to which they were 
personally committed to the special reading program. 

9. When responses are solicited ?t the and of i:he eight- 
months .period, parents of participr ting students will 
reflect a measured mean score of 4 or higher on a 
rating scale with a. designated low score of .1 to a^ 



ERIC 



Learning Exercise 3 
AV33 



designated high score of 5. when judging the extent to 
which they were cpmnitted to the successful implemen-^ 
tation of the special reading program. 

10. At the end of the eighth month of the special reading ♦ 
program, 90 percent of the participating students will - 
have had fewer than three nonjustifiable absences 
(''cuts"), ^as measured fc'y attendance records. 

11. At th*e end of l;he eighth month of the special reading. , 
program, 70 f>erceht or more'of the participating 
students H^^ill r^eport that ch^y en'joy reading more than 
they did'bef ore entering the program. The fulfillment 
of. this objective will be measured by a locally developed 
student: questionnaire. . • . 



\ 



ERIC 



'9 



-34 



" Learning .Exercise 



:\ SELECTION'- of' OBJECTIVES ^ 

it'ORKSHEET 

Task Number 3. ; Record the Roman numeral and key' words of the goal your 
panel chose and enter the number of members in your panel at the top of 
the Individual Worksheet on p-age A^36. Circle the numbers of the objectives, 
on the Individual Worksheet which your panel feel are appropriate for the 
evaluation of that goal. If your panel feels that the objectives presented 
. in the list are inappropriate or inadequate, write the ^objectives* your 
panel agreefii^are appropriate for your program goal in the spaces labeled ■ 
.PROGRAM OBJECTIVE 12 and PROGRAM OBJECTIVE 13 below. 

' • • ■ V ' 

... # I 

PROGRAM OBJECTIVE 12 - 

\ ■ . ' ■ 

■ ' \ ■ - . . ■ 

\ ■ . . . ■ 

■\ • • ■■ • . ■ ■ ■ 



\ 

\ 

PROGRAM OBJECTIVE 13 



. ^ . . Learning Exercise 3 

■ ■ " . A-35 

Task Camber 2 ; After you have circled the objectives yout panel favors or 
have written in (and circled) new ones in the spa.ces on page A-34,. rate each 
objective on the following scale: 0 =^Not Important; 1 = Import'atit;^ = Very 
Important. Put a check in the approp riate boxes on the Individual Worksheet. 
N^ext, circle the ojbjective on the Group Report Form, tally the ratings of your 
panel, and then enter the tallies in the appropriate boxes on the Group Report 
Form. (On page A-38, there is a filled-out Group Report Form that .illustrates 
how this is done.) 

• ■ «i 

Task Number 3 ; The leader of each panel r^eports on the panefists' selection of 
objectives. , 



Learning Exercise .\ 



-^NfOIVIDU/.L / ' ' ■ 

WOlUCSHEET - ■ • ^ ^ " ■/' — 



. FRCXJRAll GOAL NUMBER 




\ TTr\'DT\i' 

^ (UliY V/ORDb 




TOTAL NUMBER OF PANEL 


MEMBERS 


f 








Rating of Objectives and 




Objei tiye Number 




Tallies of Ratings 




c 


Not 
Important 


* . Important 


Very ,^ 
Important 




0 


1 


2 . 


J 

■ 1 . 


I I 


1 1 


1 . .. 1 


2 . 
3 


□ 

□ 


■I 1 
■1 1 


1 1 

r n . 


4 


I 1 


•1 1 


n ■ 


0 

' 5 


.' n 


... - 




6 

# 


r 1 








ij 


11 




8 


n 


■ 1 1 


1 i 


,9 


1 1 




1 1 


• 10 


• 1 1 




LJ 


> ... . _ -11 


n 




\ a. 




rzi 




□ 


*]3 




- . - □ 





Thjse numbers represent objectives which have been written by tlie 
p^nel. • — • ^ 



• : . ' " Loai;nln*g Exorcise 

. ^\ GROUP REPORT FORM * 

•\ .• ; ' ^ 

PROGRAM GOAL NUMBER ^ KEX WORDS ;■ 

. . J- . ^ ^ I ■ 

TOTAL NUMBER OF. PANEL *MEMBKRS - ' ' . ♦ , . ' 



Objective Number . 


■ ■ \" 

\ 


Rating of Objectives and 

Tallies of Ratings 

■ i 






■ Not 
Important 


Importartt 


■ Very 
•\ liTipX)r tant 








2 






LJ 

1 ^ 


rzi 


2 


FT 


- 1 1 . 


/ LJ 


3 


■ 11' 




L-J 


• 4 


i 1 




■ .1 '1 


. . 5 


L J 




□ 


6 

/ 

7 


/ □ 


,f 1 ■ • 
■ .1,1 


' I 1 
' 1 1 ■ 


• 8 . 


1 h 




• /LJ - 


■ • 9- 


. 1 I 


S 

■• ■ i.--r 


□ 


» 

10 


■ 




n - 


11 . 


ri 




□ 


*12 


n 




' 


*13 


□ ■ 


[71 

1 


i 



These, numbers represent objectives which have been written by the 
panclii ■ • . . 





Lor. ruin I'.xcrc. isc 



GROUP REPORT FORM 



PROGRAht GOAL NUMBER ___ "21 \\\^H Ly Comrfl iJTt D 

.TOTAL NUMBER OF PANEL MEMBERS ^ 







Rating of Objectives and 




Objective Number 




Tallies 


of Ratings 






Not 

Important 

f 


Important 


Very 
Important 




0 




1 


2" 


1 ' 


l_J 


■ : 




2 


a 




1 




3 


i 1 


■ 






e ■ 


. n 


r 




1^1 


■■ •■• 5 


t .1^ 




1 






[__,] 




1 . 


n 


\- ■ ' .0 


^ 1 / 1. 




6: • ' 


' 1 1 


8 ' . ^ 


1 1 




1 ■ 


1 1 ■ 


ft 
9 


1 r 




n • 


1 1 


10 i 


□ 




ni 


, □ ■ 




1/ 1 








*12 


• 1 1 

o 




n 


rn 


*13 


□ 




u 


LJ 



These numbers represent objectives which have been wrieten by the 
O panel. jf' 




Learning Exercise 4 



.\RNING EXERCISE 4: MATCHING NEEDS STATEMENTS TO PROGRAM OBJECTIVES 
[ ' AND PROGRAM ACTIVITIE S 



.i lable on page A~40, you will find three columns.. Column I contains 
.« statements; column 2, program objectives; and column 3j program 
activities. In the spaces provided at the left of columns 2 and 3, write in 
the' numbers of the heeds statements that match- the objectives and activities. 



mm OBJECTIVES 



hmi ami comiiiiiy iiivDlvKHMit .irliviri^'S 
Wine mlninial ejja'pt i;i tho tkv dassni'in^ ' 
where Head ^tfiijt pupils weri- i-iirMled. 
Since tn- liodli call for i-xlviisivo fi^OTniitv 
lnvo.Uer.oiU iff pLnirir.t' instriiiLiuii,il iicliv- 
itlc'S, tnerc i^ ii tifH"! tf lii'cri'.ise sm'li 



3. 



^my lid Start-!(-'i piipils have liratcJ 
vocab'ilariv^ aiid llniiod abillly prorefis 
and verb. intormtlon in si[i:uLird 
Engiisli, SiiKj'ii the Hcati St,irt Protinni 
places iii]cr empl-^sis cii lansiiase dowMop- 
rni» the staff recognizes a mi lo iipgr-idf 
pupils', vt'rbal skills.. 

It liiis bofii observed .tlu'l iiulividwl 
' nef'ds of th,: target pupils m not being 
addressed in the classruoiii. The ?f'<ool. 
phllosyphy strKSses individualizing proKram 
for each pupil. There is a ntifd lo iiisuri' 
tt-.at the statt inplei'iits this philosophy. - 

_Almst 80 percent (?!).'J) ot the F-] pupils 
tested scorud beiow i'^ on the total 
Mthematics of the ct.B.S. given in 
October 1!)?4.. These scores do not appMr t 
comparable to pupils' dtmstrated ability 
in other areas. There is a need tn identity 
means cf inprovinn, pupil periornance.in 
mith. . 



-A ) percent random sample oMload Start and- 
K-J pnpifs who have been inltie activities 
of the Oral Lan^uafee Coitponen't will signif- 
icantly increase (at the .iii level of. 
cnatidence) scores on ih' Verb,; I l^xpresslon 
subtest bi the Illinois: I"St ot i'sycho- 
Lingiiistlc Abilities frori the October 1^/4 
pretest to the May llM posttest. 

"The roedian percentile rank for ihuse pupils 
continually ^enrolled in i-i niatheniallcs pro- 
drains for ID days or uore will be S points 
hii^her in Hay 19/5 than the ntdian percentile 
rank previo,usly attained ;!■; the sane , pupils 
on the pretest in October 1^74, as peasurcd 
by the total tlathoutics scores iron the 
C.T.Ii.S. 

"During the 57^-75 school year, all teachers 
arid aides participating In the special pro- 
gram will )rovlde individualized instruction 
to the 'tatj^et pupils ,in their class. Evidence 
of this vlU be by periodic classroom observa- 
tions obtained by the project staff and the 
evaliiator, 



.PROGRAM ACTIVITIES 



-Groupinj; within each K-3 classroom will be 
flexible, changing frequently in accordance 
with the changing needs of the. pupils. Groups 
will be formed 'by needs identified by diag- ■ 
iiosis and recorded on the class proiile chart. 

-All Early Childhood Education' Teachers and 
aides win participate In a four-week summer , 
progran during July 1075. The teachers and 
aides w!il bo Invulved in problem solving, 
developing materials, and inservict for , 
improving teaching skills in IndiviJuallzeti , 
instruction, ^ 

■-Head Start-K-3 teachers will slia • the Instruct- 
ional program in language development by a 
hierarchy of performance objectives. This 
activity will be ■evidenced by a class profile 
chart covering each of the performance objec- 
tives in the classroom. The class profile 
xhart will be monitored in November, February,' 
and Hay by an evaluation coraittee mde up'of 
two teachers, two parents, the principal, 
and the Director of Compensatory Education. 



PROGRAM EVALUATOR'S GUIDE 



Section B 
DEVELOP AN EVALUATION PLAN 



The Evaluation Improvement Program 



pRFx-is ■ ' ■ * ." ^ ; \ 

■ - . . . \ 

The evaluation olja program, like the program itself, operates on a careful j 
plan based/ on what the program is designed to accpjCiplish ^nd whether it has j 
achieved those results. Thus, having learned the program's objectives and 
needs (step 1), the evaluator then constructs a series of questions that 
will be used after the program's completion' to determine if it has achieved 
its objectives. The evaluator then decides upon the instruments th,at will ^j- 
be needed ♦'o gather the i\ecessar> information and whethej: they will be / 
selected from available instruments or^ developed. A schedule is , then drawn .. 
up for the administration of the instrument^;/ Decisions must then be made/'^ 
about the kinds of analyses that will be performed on the fiata and how th^ 
f inal information will be reported and - to.,whom. Finally, to improve the 
chances that the report will be used productively ,,. the evaluator discusses 
its uses with all the recipients. ■ ^ - ' - 



o 

ERIC 



CONTENTS 



•the' PRELIMINARY WORK . . . . . . .r. . . . . . ■■. . . . . . ...... B-1 • 

■ ■ ■ 

Needs Assessment and Program Goals and Objectives B-1 
Evaluation Design ........... . ........ . . B-1 

Assessment Instruments ..... ..... . . . . . ... B-3 

Administration Dates and Personnel ...... ^. ...... . B-4 

^ Data-Analysis Techniques . . . . . . . ■. . .. . . . ; .■ . B-5 

Monitoring Program Activities V .. . . B-5 

Monitoring Dates and Personnel v,' .-. . „ . . . . • . . . . ... ..B-3 

Key Reporting Dates r • ^77^" 

Who Is to Receive the Report (s)'.'~.~,"~7 . . . . .'v. . . . • • • ^""^ 

Determining How the Data Reports Will Be Used . ., . . • E-6 

REVIEW OF PRELIMINARY PLANS . .' . . B-6 

.EVALUATION 'TIMELINE . . . . ^' . . . . ... • • • • • • B-7 

■ . \ 

J ... ■ ■• ■ 

DETERMINING .\ND OB.TAINING REQUIRED R^ESOURCES/. ...... ^ . . . o. B-7 

■■ ' . . . ■ . ^ 



LEARNING EXERCISE 5: PLANNING FOR" ASSESSMENT OF PROGRAM OBJECTIVES B-11 



I. .THE PRELIMINARY WOrV 



The preliminary work u\ developing a general evaluation iplan requires . 
consideration of all of the steps involved in the evaluation process. In 
this section, wii^phaU look briefly at each of the several evaluation steps 
and identify, some of the questions that must be formulated at each.> The ^ 
subsequent sections (C-H) of this Guide will develop ^in more detail the 
-things that you must know i^i\d do as you plan and itnpjtment your program 
evaluation. • . \ 

To help you visualize all the ^j:^s^ln" a ' totat plkn/ a Program .Evaluation 
^Planni^rig Fonii is. shown ou page B-2. This form was designed to guide yoxir r 
thinking as you plan for the evaluation of a particular\ program. The text 
th^t follows relates directly to each of the rows on th^ forn.'. / 

>is^eeds Assessment and Program Goals and Qhlefctive s 



^The assessmeat of ndeds and the setting of prr-gram goals ^nd objectiv.es cire • 
parts of the program-planning' cycle. Si.^ce^ the evaluation of a program is 
based .on what the program is trying to ac iplish, th^ objectives need to be 
sufficiently ex'p.lic.it so that whatever progress has been in^de towards reach- 
ing those objectives cap be assessed. The'program evaluator must be involved' 
at this eai-iy planning stage, at least to the, extent of reviewing plans and ■ 
making suggestions to -eii|Ure that it will be possible to ev^uate program 
object^ives. ' ' ' ■ ■ 

Eya iu^tlon Desifen . i , . 

Evaluation design is essentialTy a systematic: approach to nhe. task of 
gathering information to answer questions or make decisions. \^The technical 
part of considering a" design cannot begin until some assumptio^ns are mad,e 
about what the evaluation is to\ accomplish. Some que3t^ions maV relate- to 
progress towards program objectives, relative effectiveness of \diff erent 
programs, or relative standing, of various groups, within a given\area.' 
Specific decisions -may be maJe asi to keeping, expanding, or dropping a 



IJROGRAM EVALUATION ,PLA1>1NING FORM- 

Program . • ' 

Purpose(s) of Evaluation 

Audience(s) fcr Evaluation ' 



PROGRA>I * 
. 05JECTIVES 




EVALUATION 

•DESIGN 
, ^ *^ ^ 




ASSESSMENT 
INSTRUMENTS 




ADMINISTRATION 




DATES AND ' 
PERSONNEL ■ 




DATA- * v; 

ANALYSIS 

TECHNIQUES 


' / 


MONITORING 
PROGRAM^ 

ACTIVITIES 
.1 




i40NIT(bRING 
DATES AND* 
. PERSONNEL ^ 




KEY . . 

REPORTING 

DATES 




■ T — 

WHO IS TO 
RECEIVE THE 
REPORT (S) 




DETERMINING 
HO.W THE DATA 
REPORTS WILL 
BE USED » 


e» ■ ' ■ _ *■ 

*» 

c 



o_ • .. .-.^ • ■ - • ■ ■ ■ ()J ■■ 



program,, adopting or adapting procedures . or policies, or where particular 
programs seem to operar.o most ^r-flectivelv , The purposes identir iod.. in 
SectiorvA will help d•3^ermi^e r.he* iDipor tanL questions to be addressed. . 

As_sg- -anient Ins t numen t s * . 

The selection .of appropriate assessment ins truments depends on the kind of 
information needed to answer the ques tions posed in the evaluation. There 
are a number of different kinds .of assessment instruments that are available 
ano rhat should be considered when a selection is to be made. The • following 
a'-: iroong the types of instrumtmts to consider: 

'■ . ' ■ ■ * " 

Norm-referenced tests • ' 

Criterion-ref erenc^^d tests * - 

- Questionnaires ^ i 

Interview guides , ^ ^ <^ » 

Observation record blanks " ' • 

Rating sheets 

Log sheets . ; ' . ^ 

Re-cord summary forms . . , * 

Structured narrative reports 

• Each of the seve^ral kinds of assessment instruments has .its own stren'^Lhs 
and weakneostc and should 'be considered in the light of criteria developed 
for that specific -evaluation. Some general criteria might be: • * 

• Dpe$ the instrument adequately measure what you want to measure? 

■ ^ ■ * 

. • Will the instrument- yield cbnsistenc results at different times. and 
with^ different groups? . ' 

• Is the instrument appropriate for the particular population in ' 
question? 

• Is the instrument -easy to administer and score? 

• Is the cost of the instrument, its administration and its scoring, 
reasonable and within the budget? v. 



B-4 



Administ ration Dates and P ersonnel . \^ . ' 

• Once dates have been set for administration of theXinstruments and- personnel 
have been, assigned to administer .them, there are a l^umber of questHons to 
•consider such, as: \ ^ 

• will the assessment dates conflict with other events in a way chat 
' m^V'it diminish the reliability of the data? \ 

• Will the assessment dates allow for adequate measurement of the. 
program or program elements? . . \ 

. Will the assessment dates allow the data to be collected, analyzed,. 

and reported to the recipient on time? 
., Who can do the assessment with the greatest accuracy and with the 
least disruption to the regular scho'ol. schedule? \ 
V. . Will special inservice training be required to get good remits? 

".Do individuals involved in the data collection have a vested^ 

interest in the outcome? '\ 
. Are personnel available on the' staff, or will outside personW^^^ 

required? 

, Data-Analvsis Techniques ' ' . , 

Data analysis consists of organizing a quantity of data so that its- meaning 
•may be understood. Techniques of analyzing data irange from a simple rank 
•~ ordering of scores to very complex statistical treatment. Data-analysis 
techniques allow the reader to identify relationships that are- not apparent 
in the initial raw data and make it possible to do such things as compare 
. different groups. or the same group at different times. 



ERIC 



Listed below are some methods of arranging or. displiayl-ng data for 
analysis:. 

Total raw scores 
Mean ( ave r age ) s cores 
Median scores 
Modal scores . ' 
Percentage jcores 
^lank-order listings 
Frequency distribution 

Correlations i . . • 

■ I , . . . 

Monitoring Program Activ ities , T 

Determining how a.progrim is being conducted is done through a process, of. 

\ program monitoring. Monitoring enables the evaluator to identify unexpected 
situations o^ couditior.si that might impede the implementation of the project 
and that need the immedilat'e attention of the project director; to collect 

. data^ for interim- reporting, and to observe unanticipated * behavior . 

Programs frequently! include a great many activities ♦ "Consequently, it 
may not be feasible to monitor every one of- them, and criteria will have to* 
be established so that tlie evaluator will monitor only those activities that 
■ will yield the most useftil infarmation* One such criterion might. ba./the 
-degree to which the program would be affected if a particular activity were 
or were not continued. Another might, be ..to emphasize activities that appear 
to be most closely related to .the stated program objectives. 

9 ..." 'i ^ ■ ■ , 

Monitoring Dates and Personnel ' 

The same general considerations should be addressed here that were covered 
* under Administration Dates ai^d Personnel. • * 

*' • ■ . * ' ^ . 

Key Reporting Dates < . ^ • 

In establishing reporting dates, the evaluator must determine when the infer 
mat ion in the report iis needed by the Recipient. Also to be considered is 
when the information* to be reported will- be available. f 



I 



Who Is to Receive the Report (s) / • . ' 

An •evaluation report may be designed to ansyer questions and provide infor- 
mation to a. number of audiences. To detei^mirie who should receive d report, 

« < * . • ■ • 

the evaluator must: know the o?:iginal purposes of the 'evaluation and what its 

uses-'will be, which should have been a part of the initiaTl planniogV, ,In any 
case, a distribution list should be reviewed with the appropriate administra- 
tor 3o that the audiences knd intended uses may be verified and »the number of 
copieS'-t)f the evaluation repbrts determined, - . 

Determining How the -Data Reports Will -Be Used 

There are several things the program evaluator can do to improve the chances^ 

' ' * .... 

that evaluation reports will be used in a productive, manner, and these are 
discussed in^Section G on* reporting. For example,, the evalu4tor determines 
from each recipient of the report the 'kinds of inf ormatio/i/needed from the 
report or the kinds of information the^ recipient would; accept as .evidence-,, 
in regard to a particular question. Joint planning with recipients of the . 
devaluation report is perhaps the most . positive action* an evaluator can take ^ 
to insure..that the report will serve its 'purpose. • 



2. REVIEW OF PRELIMINARY PLANS 



Once the preliminary evaluation plan has been completed^ it Should be reviewed 
by as many as possible of the *people who are involved in the .program or whd may 



be affected by the results. In this way, any errprs -or misconceptions in the 
plan can be immediately changed or corrected. This .review process may bring 
out honest differences of opinion as to how the evaluation should be designed* 
and icnplemented. It is not always possible (or even desirable) for a plan to 
^ receive universal approval in this kind of review, but a careful reading at x 
this stage may avoid unexpected opposition at the time of reporting. \. 



Review Evaluation. Plan, with: \ ■ 

Program *?taff 

School princ-ipal^ ' ' ^ 
District- administtator ' 
Representative 'from funding agency 
Others? ' ' V 



13 i 



■ ' " 3* .EVALUATION TIMELjpNE - ^ .. . . 

Following the review of the evaluation plan and the determination that the 
r?/quired resources are available, some sort x)f implementation schedule should 
be prepared. One of the fnore common procedures is to use a timeline on. which 
all actions are listed and the es< '-nated amount of time^ and actual dates of • 
implementation are recorded. An exampleyof such a timeline, which alloys . 
space for' an" estimate of* the amount of staff ti^e required for each of the 
tasks, ia shown on the following page. ^ . . ' 

' " '■ ' ' ■ ' ^ . 

47 ^ DETERMINING AND OBTAINING REQUIRED RESOURCES 

At this^stage of planning, the evaluator must have answers to two quest ions i,t-- 

(1) What resources are required for carryin^^ut the planned evaluation? 

* ■ • . ■ \^ ■ ^ 

(2) -Can they be obtained? If the resources^-ito not match the requirements , 

'■ * 

some^^adjustments and compromises must ~be worked put. . ^ 

The- first thing "the evaluato^ neexis to identify is th^) anticipated work-; 
load on available personnel! To do tl^ds, he 'must go back^'t^^the timeline and 
look at the estimated total person day^ for each, personnel category. Then he 
. must identify the pers6ns^avai3Jable to ff ill. those needs*. For example, if the 
evaJ.uator has 'estimated thkt 300^^^^^^ teacher assistance will be , 

required to carry o^t th^ plan; .i^nd ther^; a teachers ^in 'the same program, 

that averages out to 30 >;ork days, per teacherv If this is a reali^tic amount 
of time to c:xpect from classropmi teacher^^^^^^ i:,he teachers themselves 

agree to the time ..commitment , . there is no. ^problem. If the work load is not 
acceptably, a more realistic amount '^of tine must be planned-. 

- As staff needs are -iilarif led, thef^ tiill be several decision- to be made 
such as:' Can the work day .requirements be shirked from one personnel categor 
to another? Can additional persons ^be hired?- Can suine of the required work 
days- be cut back? Resolution must'be reached betveon work-day req[uirements 
and^ personnel available to fill the requirements. 



mm EVALUATION TIMELINE 



81 
I 

00 



Number of Person Days 



0 



tasks , 



Completion 
Date 



1 2 M 5 6 7 8 9 10 11 12 




• ERIC 















































































• 












■ ' ' \ 

- 
























f 
















































1 



































































































































c 
















f 














































< ■ . 


























— 
















t 


^ » — ' " 

f 


















































1 




































U 


























































































■■; Total Person 






(■ 













She cost-s of required evaluation materials and equipment have to be 
.denLirj-ed and matched against both available materials and equipment and 
the. budget. Resolution must be reached when there is a discrepancy .between 
required materials and equipment and the available resources. 

.Costs of such required services as consultants and data processing must 
be matched against resources. Resolution between required services and 
resources irust be made before the evaluation plan can be put into operation. 

In summary, evaluation planning brings about a balance between the 
resources Chat are req,uired and those that can be made available. Give and 
.take is involved here: In some cases, resources can be added to match the 
'requirement; in others , the requirement is modified to a, less ambitious 
approach. 



B-10 



.CHECKLIST OF THE MAJOR STEPS 
REQUIRED IN DEVELOPING AN 
EVALUATION PLAN 



• Review needs assessment and goals' 
and objectives to determine their 
interrelatedness . 

• Identify the purposes for .which . the 
evaluation is being conducted and 
the probable uses of the evaluation.- 

• Review objectives to ensure they are 
written in measurable terms. 

• Identify the questions that must be 
answered at the end of the year as 
indicated by the objectives, the 
purposes, and the probable uses of 
the evaluation. 

• List appropriate kxnds of instruments 
to gather the information required to 
answer the questions formulated abo^?e, 

• Determine approximate dates wh-en the 
various kinds of information would 

' most appropriately be gathered. 

• Determine types of data-analysis 
procedures that would give the most 
appropriate information to answer 
the questions formulated earlier. 

• List the activities that need to 
be monitored together with' most 
appropriate dates to se^cure the 
information. 

• List the kinds of reports that will 
be made, botl;i interim and summativc> 
who will receive these reports; and 
the dates the reports are due. 

• For each report, list the potential 
uses to be made of the information 
and be sure that they match the 
information to he gathered. 



Check 



In 

Progress 



Completed 



Date 
Completed 



EKLC 



Learning Exercise 5 
B-11 



LEARNING EXi:RCISE 5: PLANNING FOR AfsGti^;?-'.-.;! OF PROGRAM OBJECTIVES 



In the Rosedale School District, one elementary school has had particularly 
rapid, growth, resulting in a cultural mix of pupils it has'never before 
experif iced. There have been many discipline problems and fights on the 
•school grounds, and the parents have demanded that some action be taken. 
There have also been complaints regarding the quality of instruction and 
the achievement levels of the pupils at several grade levels. 

Rather than addressing these complaints as isolated problems, a .general 
assessment was made of -the educational needs of the' total district. As a 
result of this effort, a number of general goals and specific objectives were 
developed for four programs, and several changes were planned for implementa- 
tion during the following year'. Because of limited resources, the progress 
made. toward reaching all of the objectives for each of the four programs 
could not be evaluated during the first year. 

Here is some information about the four programs: ' 

Kindergarten Program . 

At the kindergarten level, there was a high rate of transiency. The staff 
felt that the test score means of the total kindergarten' p'opulation were 
unduly influenced by transients and that this influence distorted the test 
results dcwnward. The parents wanted to know what a reasonable. expectation 
might be for pupils who attend.d school on a regular, basis, and, the staff 
needed more precise information about all of the pDpils' level of 
achievement. \. 

Of the several ob}^ect Lves f or the kindergarten program, the following 
one was selected Ecr evaluation: 

Kindergarten pupils with ■ attendance rec6rd of 75 percent or better 
will show an improvement in language skills by achieving a median gain 

• of 30 raw score polnt.s or the school-adopted language development test. 
I - ' 



ERIC 



Learning Exercise 5 

B-12 

Citizenship Program Grades 1, 2, and 3 . ' 

To deal with the problem of iighting aad discipline, ' a^^schoolwide. program 
of instruction in multiculture appreciation waa^eveloped with a specified 
curriculum t.O; prpvide each pupil with pj-annfned experiences In a different 
culture. This was supplemented^.^wl'^h a sys^fem of counseling in which each 
pupil ^participated in a number of different groups. 

The objective to be evaluated: 

..In grades 1, 2, and 3, where all pupils receive group counseling and 
instruction in the appreciation of multicultural differences, pupils 
will demonstrate an improved knowledge of the cultural differences . ^ 
. emphasized in the curriculum as evidenced by the 4istrict*~made* test 

covering the subject matter taught. 

' " ■ • \. * ■ ' 

The incidence of fighting on the playgrounds will show a 20 percen-t 

"* ■ • ? 

reduction as compared to the records of the previous year. 

Mathematics - Grade 10 . , . . 

As one r;esult of a community survey; * a minimum proficiency level in math was 
established for all grade 10 pupils, and those not meeting this level .were 

enrolled in a remedial math class where individualized instruction was'to be 

f. ... 

emphasized and individual 'diagnostic records were to be maintained. 

The objective to b^ measured: ^ * ^ 

Tenth grade pupils receiving remedi-al math, instruction will show a f^ye- • 
month mean gain in m^th computation for every five months of instruction 
as measured on the school-adopted standardized, math ^est. 

English -^Crade 12 . 

All high school . seniors were .required to take "at least one semester of 

English. Many of the graduates who went on to colleges or uaiversities , 

. hov^ever, v?ere not passing the t^ist for written expressi dn ^ even though they 

had taken the college preparatory course. The school board directed that' 
standards 'be developed for the class and that cvalu^ation of the results 
be made, * ' 



Learning Eifercise 5 
'b-13 



The objective to be measured: > • 

All high school seniors receiving a grade of'C^or higher in senior 
English writing class" and making application to' a college or university 
will. earn a passing score on the writing section of that institution's 
entrancei examination. . ' 

Using the Program Evaluation Planning Form on th j next ,,a^c , y^u may 
askume that an adequate needs assessment has b^een done, the goals have been 
adopted, and that the programs are properly designed to neeu the identified 
needs. - 

' Your group is to select one of the^four programs and fill out the ques- 
tions ih e^ch column o,n the planning ' form. The questions on page B-13 may 
be helpf^il in filling out the form. 



ERIC 



B-14 



;jearning Exercise 5 



PROGRAM EVALUATIOb PI/.i'iNING FORM 



Program 



Purpose(s) of "Evaluation 



Audience (s) for Evaluation 



EKLC 



PROGRAM ■ 
OBJECTIVES 


■ ■ it 


EVALUATION 
DESIGN 




ASSESSMENT 
INSTRUMENTS 


' ' . \ • -e 


ADMINISTRATION 
DATES AND 
PERSONNEL 




DATA - 

ANALYSIS 

TECHNIQUES 




MONITORING. 

PROGRAM 

ACTIVITIr:; 




MONITORING 
DATES AND 
•* FE/SONNEL ' 




keV 

re) orting 

DAVES 




WAG IS TO 
RECEIVE THE 
REPORT (S) 




DETERMINING 
HOW THE. DATA 
REPORTS WIUL 
BE USED 


•a • ■ 



7 1) 



PROGRAM .EVA1.UAT ION PLANNING' FORM 

Program ^ , . ; 

Purpo^e(ii^ Evaluation ^ _ 

Aud Lenco ) for Evaluation '. ' 



i 

'i 

PKOGl'J».> i 

OBJECT a: ■ 1 


■ c,t objective Is bclng'-evaiuated? 
What is the goal or need statements to which this objective relates? 
Is this objective written in such a form that it can be measured? 
Is the implied measure appropriate for the. objective? 


EVALUATION 
' DESIGN 


What questions must this design address? 

What information must this design be able to produce in cirder to 

answer these questions? 
To what, purposes of evaluatLon do these ques.tious relnte? 
Wliat Information wlLl the audience accept as evidence related to 

the purpose of the evaluation? 


ASSKSSMENT ' 
INSTRUMFNTS ' 


■ What Icinds of assessment instruments wiU,b*-. most- appropriate- to 
secure the' information required in the design? (Norm or 
criterion referenced tests - questionnaires - interviews 
observations - rating scales - log sheets - narrative reports) 


A[:)MlNaSTKATION 
DATES Mil) 
* PERSONNEL 


.During what month or months should assessment tiike- pi -^ce? 
Who wouLd.be the most appropriate person to collect^the data? - 
. "Wlio is reJi;7onslble for assigning personnel and datc^M? 


\ 

DATA •■ , 

/\^ALYSJ.S 

TECHNIi^lIKS 


Wliat kinds »T scores will be most" useful- in providing the . , , ^ 
informntion needed, as identified In the purpose and in. 
the design? 

What kinds of data axralysls will be most appropriate? 

Will cnitside help be required to do the ^required analysis/ ^ 


MO:aTORlNC. 
PKOGrl.VM . 
ACTIVITIES 


What activities arc central to the accomplishing of the objectives? 
What inforniatlon must be collected to accomplish the purposes of 
the evaluation? • , 


M(JNITUKINt; 
DATES A.ND 
PERSONNEL 


V,^o will perform the monitoring function? 

How frequently must the -act 1 vlt 1 es for this objective be monitored?. 
To whom should the* mon 1 tori ng be reported? ^ 


, KEY 
HEPOKT INC, 
DATKS 


Vnio will be interviewed to ensure th.Tt reporting dates meef 

decision or user requirements? 
Wlup will establish reporting deadlines? 


Will) V) 

, Ki.oi: i- rH!-: 
■ . !':• !" )!rr(:i> ^- 


What dlMerent audiences will receive evaluation reports on this 
(lb ject- ive ? 

Hav^- the questl<Ais idenLlflf'd by the audiencen during tlu? initial 

tlebign stop bren addressed la Llic evaluation reportV 
Have the .)Lrpnses of the cvaIuaLl.>n been acc(.nip Ilshed? 


IIOVJ rHK DAL'A 
PKf'f'RTS WILL 
\\Y. W^.Ui 


•■ WKat ariMvlLles have iMMfii pl.itUHjd t.o eiisur*' the tnc «;t. etfert.ive 
ur.r ()\ tlie evaluation rt'ports? 



PROGRAM EVALUATOR'S GUIDE 



Section C 



DETERMINE^THE EVALUATION DESIGN 
AND 

DO THE SAMPLING ■ 



NOTE TO USERS. OF THIS GUIDE 
• ■ 

In sev^'tal sections of this Ouide , in this ofie, 
each Learning Exercise is placed immediately after 
the discussion of the topic(s) the exercise is based 
on. In the other sections, the Learning Exercises are 
grouped at the end. There are advantages in both ways 
of presf^nting these materials. Which way do you prefer 



7 



^PThe Evaluation Improvement Program 



• ■ • ^ PRECIS 

• t ■ ■ ■! - . 



Evaluation design is the ke.y' Co obtaining valid and reliable information for 

* ' ' ' ■ ' / " > . ^ * ' 

decision irAlcing, which is, of course, the most important purpose, of. program 

^ ' - v: 
evaluation. ' Applying the pr-inciples of design helps assu^ a Iiigh , level. -of 

n ^ . . * , ^ • 

objecCivity by eliipinating personal opinion as a major factor in program 

evaluation. Good design enables one to compare^ith confidence the perfor- 
mance of students on such dimensioiij as past, atyl future, program groups and.. 
tm.nprogram gVoups, Treatment A group and Treatment B group. Jt lets us 
know what change has occurred, what the gains or losses have* been, which of 
several procedures is preferable, and whether sbught-after objectives have 
been reached. * 

If knowing about gains or growth is. critical, a preLest-posttest design 
should ''be selected. If assurance is needed thfit program treatments have had 
impact, nonprogram groups should be measure^ along with program. groups. If 
it l-s nece^ary to look at- subparts of the program for cause/effect "rela- 
tionships, or at* various combinations of participants an expanded factorial 
design should be chosen. If homogeneity acrojfes' groups is ,an important 

fa<ftor, individuals should be assigned to groups on a random basis.. 

. /* ' • . 

\ , ■ " ' . ■ 

; I'o be effective, the design -of an evaluation plan should be selected 

well in advance of program activities. Advance planning of this sort is . 

not only more ef fective bat also more economical, for itN allows the^ya.e qf 

sampling procedures- Lht; evaluator cdn use to apply the design ♦'o reipresenta- 

tive s-egments of students, of other populations, or of instruments. 

In summary, careful attention to design ''will help increase the 
confidence you have in the results of a program evaluation. Careful 
attention to, and use of, sampling will produce comparable results using* 
considerably fewer subjects in i he programr-evaluation activities* 



CONTENTS 



1. INTRODUCTION TO EVALUATION DESIGN ................ . X-l 

2. HOW* TO SELECT A DESIGN . : . . . . . . ........ . . C-1 

At What Level Are. the Students Functioning? C-2 

How Miich Growth Occurred During the Program? . C-3 

How, Does This Growth CoL.pare with Expectations? • C-5 

What Elciiients of the Program Contributed to the Gain? .... C-8 

3. INTERPRETABILITY OF RESULTS C-12 

4. VmAT TO DO TO AVOID PITFALLS C-13 

5. OTHER CONSIDERATIONS . . . . C-IA 

\6. MONITORING ACTIVITIES . . .... ; , . ' . • C-15 

LEMNING EXERCISE 6: EVALUAT-ION DESIGN . C-lb ^ 

\ ^ 

7; INTRODUCTION TO S/M-IPLING . • • C-20 

■ \- V. • ' ' ■ 

■ LEARNING b:XERCTSE 7: SAMPLING CONSIDERATIONS . . . C-2A 

\ 

8; liOW TO SELECT A RA^'DOM SAIU^LE C-26 

' \ \^ Check Your Random Sample before Collecting Data C-27 

LEARNING.. EXERCISIv 8: RANDOM SAMPLING C-28 

SLraLi.fied Rcindam Sampling . . .■ . . . . C-32 

y Matrix and Multistage Sanipling , • G-3A 



9. HOW LARGE SHOULD A ^AHPLFt: BL-?'. - '. . . C-35- 

• I . ■ 

! ■ ■ ■ - . 

10. TWO ( SrULIKS ON SAMIM/ING ' . . . ' . C~/i7 

l.l.yuibc ■ I . 037 

- • ,. ' 

. Numbor 2 J' " d-^J 

11. A FLNAl. '.•;(, RI) ON' DKSI.fiN AND SAMiM, l.iNC . C-40 



1. INTRODUCTION TO EVALUATION DESIGN \ 

A question of primary iipportance in program evaluation is: "How much more 
did pupils learn by participating ia the program than they would have learned 
without it?" The answer involves two bits of information: 

9 How much students iir.proved bet\?een the' tlTne the program, 
began and ended , 

• An estimate of how they would have done vthout the 
pt gram 

The first is relatively easy to- answer If proper i us irume'uts are used. The 
second is more difficult. 

The adequacy of the design the program evaluator select^ can be judged 
by the extent to which the results can be interpreted and generalized to 
other similar kinds of groups and programs. An adequate design helps raise 
the confidence the evaluator and program director can place in the results. 



2. HOW TO SELECT A DESIGN 

The particular design you decide upon will depend upon the types of questions 
you want to answer. Most program evaluation Jias, in past years, been more 
subjective than objective. Have you ever heard someone sa^ , '^Of course it's 
a good program. You can just tell by observing the studen^ts and the teacher 
in action. Anyone can tell it's good."? ' \ ' . 



I know it's go\:jd be'cause I feel It herje 
program warm feeling. 



This is not what program evaluation is all about'. What is needed is the V.:Lnd 
of design that will serve tp provide valid data for decision in.iking sound 
justification for Continued funding. 



There are many designs r.hat. "are. usable in a school setting that do that, but 
bnly a few will be presented here. They deal with. these questions: 

1. At what level are the students functioning "at the beginning 
and at the end of the program? 

'/ 4 / . 2, How much growth occurred during the program? 

V . ^ 3, How does this growth compare wifh our expectations? 

.4. What elements of the program contributed to the gain or loss?.. 



At What Level Are the Students Fiinc tioning ? ^ " t • ' . 

Knowing where students- are functioning at the outset is. necessary to success- 
ful implementation of the program. /A program may be designed for certain 
types of students with' slcl lis at an- assumed levoi. Unless data' are^available 
that .confirm the fact that pupils in that program really are at th/t skill 
level, you may mis£ the target, r A test giv.en early^ in the program* can giH^ 
you this information. sbv 



Are the studX^nts entering this'p^gram 
really functioning at the level expec.tvd? 



premeasurement 



program 



This is a legitirat? \ i^^j of a single testing session early "in the pr'o'gram. 
It provides a t Vr^nir:-] rk that indicates where things stood at *the begin: .g. 

Sometimes the program is well under way before an 'evaluation plan is 
developed, which is a practice to be discouraged. What d'oes the eval'jator do 
, in such n ciso? One option is to plan a design triat uses a test ati the eji^d 
.of the program. ■ ^ , ■ ■ ' ' 



At vh\i • love 1 arc stndejp-ts function i ug 
^ the end of the program? 



p rogram 



pos tmeasurcrnon t 



o If the. design :provides only for this,* it really tells us nothing about 
the effectiveness of the program. It tells us something about what students 
know after the trtfetment. of the program, which may be used ±n exploring and 
developing ideas for further program planning. But it is not a design that 
will- lead to any useful conclusions about the effectiveness of the jprogram 
3t hand. * • I w ;<i • , 

Fortunately, there are some '^Retrospective" measures that 'can be 
attempted* Unless the^year is about over, a test given even as late as 
midway in the program cycle will give t' e information against which final ^ 
results c?.n be compared. Another useful option may be to go to whatever 
•stud^enr/ records, there a1:e that indicate general levels of past^ performance. 
The conversion of historical data to baseline data is, at best, "messy," but 
it does otfer an expedient measure that- can save a late-starting program 
evaluation. — ^ : • . 

How Much;Growth Occurred During the Program ? 

Only when this question is asked do we begin to be^concerned about the effect 
-^that the program has had on the students. 

r How much did students improve 

during tht program?.. 

premeasuremerit » program ^ postmeasurement 



While this design -will tell us how much change has occurred, it, too, really 
does not address the basic question, "How much more did pupils learn by 
participating In the program than they would have learned without it?" 

♦ A single-group time-series design uses students in the program as their 
own controlr~group. The same measurement is made on the same students at 
regular intervals several times before and after the program. Tf the program 
appear^ to disturb the ^rend of measurement results in a" positive way, this 
may* be evidence that the program has been effective. This design might be 
used within a program if a teacher wished to increase the number of new words 




\ ; A 

\ . . ■ \ . 

learned each week beycnd the current rate and Introduced a special reward 
system for Lhe child learning the greatest number of new words .each week.for 
the next month. Weekly records kept before and i^fter introduction of the 
system might look like this: "A * 



Example: 


Before 




After 


^ Actual Results 






:ion 




1 Probable Oain 






Intervent 




^ '^^^^Best^ Guess- of- 

; ..Pgpil Status without 
; Progtain 

■ • \ 



This design addresses the same question, but gives betti i videpce as to 
whether a real change has occurred. 



How much did students improve during the program? 



premeasurement » prcmeasurement ^ premeasurement — 

1 . ■ 1 3 



^ prog, im 



pos t measurement » post measurement » pos tmeasurement 

1 ' . 2 , 3 



c-5 



How Do.es This Growth Compare with Expectations '? 

Usually the reason a new- program is instiuuted is that it is thought to be 
better than what currently exists. To find out if the program is better, the 
evaluator needs to consider what -happened as a result of the program and what 
"might have happened if the program had not been introduced, ' ^ 

Wlienever possible, some kind of reference group should be used to 
'compare the status of program participants to that of similar persons not ' 
participating in the program. There are . three kinds of reference groups: 
(1) control groups, (2) comparison groups, and (3) norm groups.. Care must 
be taken that persons in the reference group are as much like those in the 
'program as possible with respect to age, ability , , reading level, ratio of 
males to females, number of minorities, etc. Persons need not match on a 
■one-to-one basis (in fact, it is better they do not), but the overall group 
. profiles should be sinilar.^ . If it is not possible to find a ref^r^nce group 
"in the^same school, try to find one in another school which 1^ sinilajr- In 
determining whethe^ another, is similar, consider the following: 

• prior achieveuient of students 

population density of community ' . . 

• size- of. school . 
.x*-- schoo^ { rganization' (e.g., K-3 vs. K-6) 

• teacher training and experience 
median family income ^ ^ 

• ' expenditures per student * . 

• eligibility- for^ ^'tate ajnd .federal programs ■ 

• ^ . ' -I ^ 

o ra.cial composition - . / 

e administra't ion/ teaching philosophies ■ ^ ^ . - . 

Program and control gp^^ps are formed when pupils (gr other participants) 
are randomly assigned to program activities or nonpro^/am activities with in a 
prog-ram-evaluation model. It it is possible to make-'' random assignment to one 
■group oi; the other, Lho- conc^^rns fhat ^^rogram vcri/us rion[>rbgram groups be . 



/ 

^ ■ ' / 



alike do not apply. Even so, it would be a good idea to check with a random 
assignment to see whether there^ may be drastic differences between groups. 
Randomization is an acceptable way of assuring "likeness." If it is adminis- 
tratively feasible, it is the best possible way to control grouping.. 



Random 

A -:s i gnment 



How do nonprograin students identical with 
program students compare with regard to growth? 



program students: premeas, 
nonprogram students: premeas, 



program 



pos tmeas - 
pos tmeas * 



Sometimes in such a design, it may be advantageous to plan for no 
pretest. This may be becaujse'Tiew .and unfamiliar concepts "are to be^ taught 
and it se€>mii;^inl^K'ely./;/Hat information will be gained ■ by a pretest . Or, in 
the case of very young children, the amount of. testing that can.be done is^ 
very limited. Or some*:imes the use of a pretest itself may influence or 
contaminate the behavior of the students. With random assignment of pupils^ 
to program and nonprogram' groups , then a. posttest-only design will yield^the 
needed -information on growth. . s , , 



Randoin . 
Assignment 



How do nonprogram students identical- with 
• program ^Students compare with rti^gard to growth? 



program students: program 
nonprogram students: 



pos tmeasuren:ent 
pos tmeasurement 



A comparison group is one in 'which existing classes o;f comparable 

pupils have been identified as nonprogram participants'^. Comparison groups 

are not specially organized as through random assignments. Both control 

» 

groups and comparison 'groups must be given the same instruments on the 
same schedule as the program group. If these groups' are in the sam.e school 
with the program group, care must be taken that "contamination" does not 
occur. All too, frequent!/ if a new method in- the new progtam is getting' • 
goo:l results," the word will filter through to the control or- comparison-- 



group teacher, and the ^new method will fi^nd its way into that group's 
classroom,- If this happens'; the entire evaluation may. be invalid. 



Nonr£»ndorn 
Assignment 



How do comparable nonprogram students 
compare with program st '/.dents? 



program students:* premeas. 
nonprogram students: premeas. 



nrogram 



pos tmeas . 
pos tmeas , 



When neither a control nor a comparison group is feasible, norm groups . , 
may be used. Norms on standardized tests are the^most common example of this 
type. Since ndrm groups are generally representative of a broader population 
they may not be comparable to the particular prograri group, and care must be 
tak'en when interpreting the findings. • 

When norm groups are ubed as a basis for comparison, the exp^ectation 
is that pupils' will be . relat ively higher with respect to the norm at the end 



•of the program treatment . then they wer-e at preteat, 



7^ 



Pretest, I . Post test 



I I =. Norm Group 



= Program Group 



How do program students c'ompar^ with 
' some norm group? 



program students: premeasureinent » program ■ > postmeas. 
norm g^'oups: nprm^ ' . ' ' » norm^ 



c-s 



Another type of comparison^ that can be made is to compare the progress 
of a special program group with its own pX?t performance. If students who 
have typically been growing at: a rade of ne-half a year for each year in 
scho.ol achieve a year's growth in the first year pi a .-^ew program, this may 
be evidence that the program is ^effective. 



How does studenu^^v achievement in the luvv 
prcjran) compare a . ' their past perf orma-;-. 



current program: premfc? 
pas t performance: aveia»TC 



program — i 
owth- per year 



J 



What Elements of the Program Contribute -j c _ ■ h e Gain ? ■ 

■• (i 

Why is thi^: question important? , " 

• If we knew.^what elements of the progra?n made more difference and^ 
Which made less, ws would be in a ^^ositicn to make improv^mcnlrs in 
the program. ' * 

• Information about v-.ifective andvless efLe.nive elements could be 
"used to reallocate "-sources oi time ant:. .Tioney. 

• If ^we knew what el.L.i-,fens 5 were essential., we could predict rhe 
chances of the- program being successful elsewhere. 

X Unf ortunat',. ..^ , finding out what pdrts of a progr.iri contributed most to 
the gains acliievrd is • a vev' difficult task. Effective programs may contain 
many factors relaced to success", and the task of separating them makes the 
evaluation design more complex. * Thus'; with thiS kin.; of design tbr-re is hw"^ 
even greater n?>ed' for planning carefully ' i'n a*. va£;ce- One certai' ''y would not 
construct this type of evaluation design' aftef a program has been :tar.:^'?. 

Ideally, the evaluator wouM like to take into'accMunt as many factors 
as Dossible at one time in order to , isolate just ^^hat " is that is causi. ^i. 
'the changes which do occur . . Unfortuna tely > the more factor's that are put^ 
into a design, the. more complex the design becomes. Suppose we wa^-t to 
investigate which of two types. of instructional materials can be. used 
as effectively wir.hout aides as with aides. Th is would * reqa : re at least 
four classrooms. Tha't is, tnere would need to be at lease .. -e teacher "for 



c-9 



each combination of aides and type of material lides present; type 1; 
aides absent, type I; aides present , type 2; and* aides absent, type 2, 





I..structional\^laterial 


Type 1 


T^pe 2 


'Aides 


Present 


Teacner 1 


Teacli^r 2 , 

\ 


Absent 


Teacher 3 


Teacher\^ 



If the teachers' skills and experience are very siirilar, this sira^e design 
could be used. (In the • real world , however, i : is very unlikely tlVt your 

teachers will be so evenly matched in .training, experience, aad taleaV.) - 

. • - ■ ' ■ ' \ " 

In- order to allow for differences in teacher i, at loast two should be . \ 

assigned' to each combination of aidefe. and lype of ma-teL'ial, making an 

eight-classroom design. ^ . 





lust -ztior ■ ' M;i'~erial 


^ Type 1 


Typ. 2 


Aides 


Present 


Teacher i x 
Teac^.er 2 


Teacher 3 

d 

■ 

Teacher 4 


Absent 


TeacVier' 
Teacher 6 * . 


Teacher 7 
J'--acher 8 



\ 



f 



This design is based on die assumption thai: othi ^ rul.wai t factory arc- belo 
constant (i.e. , amount df instructional tim-., type of instructional mtthod, 
ah.ility level of pupils, etc.) across all^ classes. If pupils can be randomly^ 
assigned to clssses*, factors associated with pupil character istics should be 
about equally distributed" and should not. unduly inf Iva ice one grou^ more tl a 
another;. * 

. This .tyoe of design is very flexible and can be expanded tc incl*-de 
more factor^:, and more than , two categoxies or ^levels per factor. Obviously, 
the numher" of classrooms inc-reases rapidly, and only the larger di: tricts 
will have enough classrooms, to us& nio re complic ated designs. An expanded 
design, howftji^r,, pakes it pessibie to'sce! answers to many question's in th$ 



ERIC 



by 



course' of the program evaluation*. Tj;\is is fortunate in those cases in which 
it- is believed that two or .n;ore factors interact with each'other: the use of 
aides', for example, the availability of a reading laboratory, or individual- 
ized curriculum materials. In an expanded design, each of. the factors may be 
considered individually or in terms of the effects^ they have on one another. 

Notice that to use an expanded design you must define the f-actors very 
precisely. For example, individualized instruction may mean a student moves- 
at his. own rate or that he receives individual help or that he has his own 
objectives. Moreover, you might want to look at the degree, extent, or " 
intensity of the fc«ct;pr. You might colupate the effects of individual help 
every. day to the effects of individual help every other day. . f 

There are some complications hich should be mentioned. Is t\\e teach- 
itig comparable in the several classrooms? Is the design for ^ach classroom 
fo'llowed with, rigof:? Are staff members committed to their approach?- A 
negative answer to such questions ' could bias the results. 

. ■ 

The questions you: must consider' in planning this or any otiiier»>k.ind of 

program evaluation desLgri are: 

i. \^at are the most important 4uestions that a program 
evaluation" can help answer and how. much information 
do-you need to answer t'hese questions? 

^ 2, What other relevaiit vatiables are there that might 
have an effect on the outcome and how can" they be 
held L Lant or randomly distributed among ' — k^^^ 
classes or subgroups? . 

^ The second question involves a facftorial design. Only larger schools 
and districts have enough classrooms to use su.ch. a design. Under- certain 
conditions, smaller schoolss may i^ndertake a n\odification of it. - , - 

'a factorial design n)a// be used If individual pupils can be randomly 
■assigned to each different cpmbinatioh of factors (each cell in the table on 
page C-11). For example, to study the effects of differing amoun.ts .of, tiine 



ERLC 



spent in a language laboratory by students various levels of ability, 
one might randomly assign pupils to mahe u:^ a group for each , condition, 
Sach a design iijight lo^k I'ike this: 



Random 
Assignment 







— • — • • 9 ' — : — — • 

Amount of*-Tinie Spent,- in Laboratory 






10 Min. 


20 Min-. 


30'^ Min. 


40 Min, ^ 




High > 


8 

Students 


8 

Students 


8 

Students 


8 • 
Students 


y Lev 


Medium 


8 

Students 


8 

Students, 


8 

Students 


8 

Scud^nts 


■M 
•H 






. i-- 






rH ' 


Low 


8 

Students 


8 . I- 
Students 


8 

Students 


Students 



In this example, students can 'be sorted on some dimension of ability, and 
the amount of exposure given each student cr n (and would need to be) care- 
fully controlled. Only factors that can b.Q -imaged in this mannA dan be 
studied using t:h;Ls kind, of dqpign. . ^ 

While designs as extended as this have v .jt been commonly used in 
program evaluation, they are very powerful and have considerable pqt.ential 
for sitliat'ions in. which the use of a reasonably sbphisticata5 program 
evaluation is critically itnport<int. - . ' 



. / . ■ 3* INTERPRETABILITY .OF RESULTS " ^ 

■ ' / ■ ■ ■ . ' • ^ ■• .■ ' ■ ^. \ ' ' ' • . ■ ^ • 

Wh^'her results of measurement within a given design have significance 
d/pends, in part, on the extent to which the outcomes are the ^e^ult of 
^treatments and not of some other ^cause or combination of causes. Some 
.non treatment causes might be the following: . • 

1. • Differential drop-out or attrition rates in groups bein^ compared . 
Even though you may exclude from your analysis those students who 
have not been in attendance during the entire program, the transfc. 
of three of your "best*' students can have^ .a marked impact on your 
results. * * " • . ' 

2 ^. ' Kailurr to account for'^some related condition that directly 

af fected the results' although it may not have been expected to . 
. For example, ' if the program requires students to impend an extra half 
hour af: sphooL each "day , and 'one of your objectives is to improve 
attitudes toward school, you may find that nonprogram students who 
get to leave earlier will have more positive- attitudes . 

3. C^ontaminat ion between program and nonprogram students .'* This occurs, 
for example, when your special program teache.r becomes enthusiastic 
about a filmstrip used in the program and lends it to a nonprogram 
teacher. 

4. The Hawthorne Effect. If much to-do is jnade about a new pitogram, 
improvements that were not caused by - the program.'may occur because 
of D-the novelty, 2) an.awareness that one is a participant in a 
special group, or 3) a new environment which includes observers, " 
special pYocedures, equipment, and'^so on. liriprovemeats caused by 

these influences are usually short-termed and will probably dis- 

• ■\ " ■ ■ ■ 

appear over time. .. . ' , ' ' , 

• Evalua tor (or teacher) bias,' the "Self-Fuif illiri;^ Prophecy ."- This 

Is a well-document ed -human hazard«in evaluat ion^design. The 
evaluator ('teacher or other observer) has preconceived ideas of 
what go^ng to happen and sees .what he or she expects to soe, 
ignoring ind ications con trary to ej^pectations. ' . 



C-13 



6. Change in school programs, personnel facilities , class size, 
■ community factors and other such conditions . These factors 
can affect student perft^'rmance though none may /be related to 
a program, its treatments, activities, " and its evaluation. 



j .4. WHAT TO DO TO AVOID PITFALLS 

4 

Be sure to take into account time spent on a given subject area. 
P^upils with high absentee rates will' affect results. If your 
comparison group is spending twice as much time as your program 
group, tj. me alone jnay prevent your program group from comparing 
favorably. ^ ' \ 

If you use a norm-referenced tesr, try . to. select one whpSe 
normative data were .collected at about the same time of year . 
(fall or spring) you plan .to use the test. * ' * 

Be sure you use the appropriate test level. If most 'students ,^ 
answer nearly all or hardly any of .tt\e iterms correctly, 
measurement will be both invali-d and unreliable. 

. ■ ,1 . , ■ 

Be sure pretest and posttesX' are comparable; it is preferable to 

use different fo^rms of the same level of the same test. This Is 

not as critical if you have a good control or comparison group. 

If one of the other designs is used', howfever, there is no way 

to compare the results of two tests norm^d on dif ff:rent groups 

::f stu(^ents unless those tests haive been statistically compared. 

In some ,cases, conversion tables may be available, . as in the 

case of eight of^^the^most commonly used reading tests for grades 

4 » 5, & 6.* 



*Loret, P.G. et al . Anchor test study: Equivalence and norp '^ t es for 
selected reading achievement tests .(grades 4, 5;'& 6 ^. Offic education 
Report 74-305. Washing'.on, D.C.,,: U.S. Government Printing Office, .1974. 



\ 



ERIC 



9 J, 



5. If students hav^ been selected for a pro~g'ram^^i>-tlie^ of 

extreme scores (disadvantaged: or gif ted) ^ doi not use the test thai 
prodiJted these scorLs in your regular program-evaluation testing. 
You will ne'Cd tests of less or g;?^tcr difficulty. 

b. Do not lean heavily on grad^^quivalent scores for measuring 

results.. Design your sturdy in such a wr-.y thit raw scores can be 
converted to standard/^ores at the data-ana'lysis stage. Grade- 
equivalent scores yiire suitable for descriptive purposes (see 

section on .dataKinaly sis) . ' ' ' 

^ ' j ^ 

7. Be sure th^comparison you. rjake is between program group and 
comparison group at the start and at the finish. . If you have 
seleored a good comparisdn group and have an effective progran,/ 
t\i4 initial differences between rhe two , groups will not be / 
^significant, but the posttest differences will be. . -/ 



* / 

5. ' OTHER CONSIDERATIONS 



Part of sound program-evaluation design is to plan how to determine whether ' 
or not a given program would be effective in another setting, This^ may seem 
simple until you consider the multitude of factors involve^. Think, for /. 
example, of; the variations you can expect among" students, teacher^*, schools, 
and comirunit ie,is : ■ ~ " ' ' . ./ a 

• STUDENT" FACTPJIS - sex, ; age , ^ at t i tud^ ; courses taken, etc. ' ' / 

• TEACHER FAGTOPS exper'l^ence, enthusiasm^ skills , intelligence, 

■ etc. . . ' ' ■ ■ ' " ' ' ' ■ 

) • , ■■ r^'- 

• SCHOpL FACTORS - cl-ze, budget attitude tow^d innovation, etc. 

- • COMMUNITY FACTORS - income levels, parents' occupations, perceived'* 
value"of schooling., etc. 

However, the situation seems simpl*er again when we realize that not all' of 

these fcic tors , are , likely to* be. important to a particular type of progFam. 

' " " '* ' . ....... 

The probl-emjthen^is to determind which factors are relevant in testing to 

■ ' . * ■ • . ' / ■ ' , ' ' *♦ * ' 

see. whether a ppogram can be. sT^iccessful^y exported from one place to another. 



The first step is to reviev; a very complete description of 'the 
program f.tself. What i:; it about the- proferam that 'is really essential? 
Wha-t are tne componen: 5 that' must »be transf eri^ed. to ^th-e new location? 

What characterized the classes with which the program worked well^ 

, " ' — . 

> ' • • . . ■ . " ; 

Subsequent steps in plai>ning for a possible exportatj^on will carry 
the>prbgram planner through all the stages in the new s^te that 'had been 
pursued .in the old one, -but under an entirely different set of* cond^ioos . 



6. MONITORING ACTIVITIES 



Whatever design is. used, some procedure should "be estab,lished to monitor the 
activities to assui;e that the design'maintains its integrity atid that the ^ 
program is being implemented as intended. Are the new materials • being used 
■ with the apprbpriate groups? Do not underestimate, the croSs-f ertilization . 
that may takJ place between teachers using different raethodp or*materials. 
A close checlf must bev^de periodically to det^ruTine whether the program the 
evaluator th finks he is;»as>^stng is^ the one being cfarried out*. , 



ERJC . 



— L. 




6 



Learning Exercise 6 



LEARNING EXERCISE 6: EVALUATION DESIGN 



A 7th grade special "reading pxogr;m enrolls 100 studeats in four different 
classes. Enough new reading materials for two classes have been purchased. 
The teaAen. ire prepared to implement an individualized approach with a 
diagnostic/prescr-iptive series of te'sts^ and activities to accommodate 
varied rea^Tng~^~sk-i-l-l^-*--r^The approach '^sed until, now has not provided for 
'individualization and hbs not used the nevr~materrals^^u_t_Jj_a^^^^ 
successful. . The facultkr arid administration would like to know if the new 
approach, is really any Better and "which of the new ideas is more beneficial. 
Your task is to develop an evaluation design to provide them.'with as. much 
i|nf orinatton as possible. There\5re six different teachers whom you could 
assign to* the four diff&rent classes. r^" 



1. flan a design to compare the old. program with the new 



program using program- 



measurement ryot:;ation or a matrix. 



/ 2. How would' you assure that the'groups of studei^ts wore 



Learning Exercise 6 
' . C-17 



3.. How would you assign , the teachers in this situation? 



4. What would you do to guard • against contamination? 



•5. What wQiuld you do to guard against any Hawthorne Effect? 

^ ^ . -C- 



6. * What iyould you do about the amount of instructional time 
-^devoted to reading in each ot the four groups? 



Learning Exercise 



ANSWERS 



Pl.an...a_:;des ign.._^ o . .co mp ajr e^^h e _ ol d j) r ogr am._ w i th th e . new program. 



or 



Random 
Assignment 



ion 


■New Materials 














Yes 


No 


Inst 


o 


25 


25 


ual 


;2: 


Strudeiits 


yfe tudents 


no 








> 


Yes 


25 


25 


C 

M - 


Students 


Students 



. Group 1: premeasuremeqt 

Group '-2: premeasutement 

Group 3: premeasurement 

. Group 4: premeasuremenc 



new, materials 
and old instruction 

individual ins t^ruction 
^ / and old materials 

^ nev Materials and 
individual instruction" 



old program. 



pos tmeasurement 



postmeasiirement 



pos tmeasurement 



pos tmeasurement 



How. would you assure" that the groups of students were comparable? 

Randomly assign students to the four different groups. Compare 
results of pretest. Compare ethnic. composition of group, v. 
occupational level of parents across groups, and numbet of boys. 
.. and girls in-^ach group* If real differences exist on any of 
- these comparisons, interchange students to better balance the 
groups. ^ , 

How would you^assign the teachers in this situation? 

Compare training and experience of the six teachers. Use "soft" 
data if available (What reputation does each teacher have with 
his or her peers and students?). Select the four teachers most 



alike on these variables and assign them, randomly to' each of the 
four groups. « ' . j , 



Learliing Exercise 6 
• C-19. 



What would you do to guard against concaminAtiQTiX , 

Meet with the teachers to explain how you are planning to 
study the effects of the new materials and individualized 
instruction. Seek theit cooperation and explain how ^ 
sharing of -either information or materials can destroy 
the evidence needed to make good decisions. 

What wouLi you do to guard against any HaVthorne Effect? 

Refrain from making any predictions 'about the relative . 
merLts^of the old^program, new material, or individualized' 
instruction. Be frank with the teachers; 'io not give the 
impressiqn this is some kind of contest. Advise teachers 
not to confide in pupils that some kind of experiment is 
going on. • 

What would you do- about, the amount of instructional^ time devoted to 
reading in each of -the four groups? 

Presumably, in the 7th grade, cla^sses ar^ of the same length; 

thus, availab],e instructional time is the same jEor all groups. 

If this is not so, arrange schedules so that'each group does 

*^ ■ • ■ 
have the^ same amount of exposure. 

Absenteeism may, however, o.ccur at different rates in the four 
.groups. Therefore, teachers should be asked to keep attendance 
* records.' At the end of tut year, all pupils on whom there are 

both pre- and posttest scores should also have complete records 

o*n avtendance. Before analyzing tests, attendance rates for 

■the ionr groups should be compared. 



C-20 



7- INTRODUCTION TO SAMPLING 

'■ t • ■ . 

One of the early dej::isions in planning is to determine whether data should - 
be collected from the entire population involved in a program or from only a 
representative part of that population^, , If only a portion* of the population 
is usfed, that portionMs called a SAMPLE and the process used to select it- 
is called SAMPLING, ^ ■" 

Sampling procedures are important because they allow the evaluator 
to collect information more economically, A sample tl^at is representative 
and carefully selected permits the evaluator to make inferences, general^ . 
izations, and to -draw conclusions about an entire population by app,lying 

the evaluation only to the sample, " ' ^ o 

; • • . ' 

People use sampJing in everyday lif e, ^ of ten uncor.scipusly , A consumer 

who samples a- quart of milk would not need^to drink the whole quart to 

detelnnine whether or not it was sour. However, sampling people is not 

as simple a.s sampling milk'. It is, therefore, ^important to Specify the 

sampling criteria, as these will diefine the population to which the 

findings ai^e expected to he generalized,^ 
■ ■ /• 

Safiipling' is especially appAjpriate for program evaluation-, ' Most 
evaluation Activity in the classroom is the purpose of grading 

individual students, A score in these cases is needed for each individual 
in the population, • Information at the individual level *ls .useful to the 
teacher and necessary for student assessment. Many school personnel, as 
a consequence, have become accustomed to thinking only in tarms of whole 
populations. However, program evaluation req?iires only ijETtormation about 
the effects of the program on students as a group, not as individuals, • 
Furthermore, contrary to populc:^ belief, sample size 'can Be comparatively 
small and still provide reliable information, provided human characteris- 
tics known to ^contribute .to variability in responses are used as the^ 
basis for sampli'ng, . 



It is almost certain to be. more economical and more effective to select 
a sample of students in the program and administer data-collection instru-^'^ 
ments to" them, " • 

Most of th-e advantages; of sampling are related in some way to lower 
co&ts. It is less expensive to gather and analyze a hundred scores than a 
tfioil'^d scores. Expense is an especially important factor, with some types 
of instruments. One* teacher can administer an hour test ^ to 30-50 students 
at one time' within one hour . It. the. 'same time he could interview only, a 
few. Time -and^ money may also be saved in scoring, especially with some 
types of instruments.. * " 

As a result of loi/er -zosts r sampling may make it possible'' to use some 
types of instruments wl. .ch would not otherwise be feasible. For example, 
suppose that the ideal method of data collection was by an interview. 
Gathering data by this means requires not only laf^e amounts of tim^ but 
disc considerably, training of those who will do the intetviewing. For this 
reas.on, an interview instrument- might be rejected if dat^were needed from 
the entire' population. - Bu't ii* a sample is used , . interviewing may be 
feasible. ^ . c • 

The type of instrument used is related to the type and number of 
objectiveii that must' be assessed.^ Often, the most important pbjective of 

program cannot be measured by a paper-and-pencil test, ^t may involve 
an attitude or behav^tor outside the school setting. Anticipating the time 
and trouble involved in observations or follow-4ap studies, the school may 
decide that an evaluation to •ksse.ss the success of such a * hard-to-reach - 
objective. is not realistic. But sampling might enable the school'to 
gather data on a limited number o£ cases, making it possible to carry out 
an evaluation of that .objective. Likewise, a school may want, data on 
several hundred objectives. A sampling pjocess could be designed whereby, 
different groups of students are assessed on different groups of objectives 
The school thus obtains the needed data, but no one student is subjected to 
exhaustive testing. 



The economies resulting from sampling may enable the' evaluation .pro- 
gram* to us< more than one measuring instrument for a given objective. For 
exan.' 1. .. a questionnaire may^be developed to assess an attitude; But do 
stuyt'.: cwritten responses reflect their actual feelings?. Suppleqientar^y . 
use of Ifiterviews, observations, or open-ended questions with a sample 

of the students might provide a way to validate the questionnaire results. 

»i- . ' , 

.Computation skills can be . assessed by a p^per-and-pencil test in the math 

' . * " ■ '.V ■ 

classroom. 'But . the program evaluator might Aso' be interested in whetFTer 

^ these skills manifest themselves in a social studies dlass. Comnients ' f rom 

other teachers on a sample of students might help to assess the transfer 

of those skills to other situations. 

Sampling may. be used in a variety of ways to provide new data ^or to 
improve oq dat^ already collected. For example, teacher and course evalua- 
tionSjby students, peers, and administrators are* frequently administered at 
the end of ' tbe -sxihool year. . If su^ch data were collected using sam| les in- 
^^the ^f all, winter , and spTing. guides for instructional imp^rov^ment could.be 
"^-provided ^during the year. ' 

In considering a sampling procedure, there £^re a number o£ preliminary 

. "questioji? which need, to be raised: - " - ' 

" *.'. 

• Will a sample provide' the representativeness which is necessary? 

• Will sampling be Siore efficient than using the total population? 

There are other questions. Use of a sample may result in '^ome loss of 
accuracy in t:he information obtaiTfed for program evaluation because a score 
from every student in the population may not be available. This may raise 
the question of what degree of accuracy loss is acceptable in return for ^ ■ 
the sa zing of time and trcjuble. On the other hand, gathering data from a 
sample rather, than from the'"^hole may yield more accurate results. A large 
amount of data whictt is carelessly collected is useless. A smaller amount 
collected under carefully controlled conditions is very useful indeed. 

If a sampling prpcedure i^ to be used for all or part bf the evalua- 
tion, the questions of the size of the sample/ and the' ways to obtain it miist 
be handled next;. Relatively sma 1 ler' samples can be used .when "the population 



tend-s to be homogeneous > when larger differences are expected on the. . - 
factors measure\l, or when many of the factors contributing to variability 
are controlled. ' , • 

Relatively larger samples are appropriate when the population is 

heterogeneous, if --there are many uncontrolled factors, or if the differences 

. • ' . - ' " ■ *• 

"in the factors are". expected to be sligh't. . • 

The adequacy of the findings iS more likely to be influerfted by sample . 

design than^by sampl'e size. All of the advantages of sampling are based on 

the assumption that the sample is representative or typical of the total 

population.. If the samp.le is not represevitative of the population, the data 

obtained will be misleading. Tf we assess the effects of a reading program 

from one -teacher's class^Ta sampl^e; but not a representative one--we- may 

have information aboyt the effectiveness" of" that particular teach^^r but 

-no'c about the effectiveness of the program. A sample is repr,esentative to 

the extent -that_ it reflects the characteristics of the overall population 

in that setting. The" technique used, to obtain representativeness is random 

sampling, which is discussed in part 8. . 



Learning Exercise 7- 



C-24 



LEARNING EXERCISE 7: SAMPLING CONSIDERATIONS 



^ Read the situation given below and attempt to identify some of the difficult- 
ies and possible resolutions. v. • ■ „ 

A small school has used teacher-parent conferences 
as a sub'fetitute for report cards. The principal was , > 

re^sponsible for the innovation and believes it to be . ^ 
successful, but he -wishes to have the views of others 
who are involved in. the proeesis. He designs a ques- 
^ \ tipnnaire which is placed in teacher\s' boxes and sent 

home with/student's. The returned queistionnaires are ' - 

to be tallied for use in determirring whe^ther the. . , 

conf erences should be. , continued. * * ^ 



. ^ Difficulties 


o Possible Soluti^fis 


1. 


1 . ' . 

t 






3-. . 




4. ^ . . . X. . 




5-:- • • ^ ' ■ 





ERIC 



Learning Exei . ^se 7 ^ 



..ANSWERS 

Teachers "Snd parents arc only pa-rt of- the "population in question. What 
about students? The population hac not been completely 'defined. ^ 
What proportion of teachers and parents-will return the questionnaire? ^ 
Will those who do be- representative ^ 

The principal is known to favor, the use of conferences. Will this - 

(♦ ' ■ ^ • 

influence the number or nature o'f the responses or their interpretation?' 

■ ' *\ 

It mayVbe advisable to allow all parents and students-. to 'express^ their 
views. The.;.questionnaire £ouid be made available to everyone with the 
returns kept separate from* those in the sample. Asking everyone to" 
. respond" will reduc^' the chance that some people w^ 11 wonder why ^they ^ 
were excrluded. Cost factors and local conditions will dictate Whether --^^ 
or not_this. is advisable. , . 



\ 



C-26 



/ 



8. HOW 10 SELECT A RANDOM SAJiPLE 



^he method .of 'selecting a .random samp lie is uo" complicated, but there are 
some common misconceptions as to how it"i<^ done*. The selection must-be done 
in such a way that each persoh in 'the peculation has an equal chance to be 
drawn. A commonly^ used procedur^^ to select ev^vy 10th or 20th name on a 
list. If 'the firjst name is chosen randomly from among the first 10 or 20j 
every name has the same xhance cf being included in the satripleT Samples- 
dif f er^however-. The list tpay "^e alphabetical^ it . may be organized by grade ' 
or age, or it may be.,tocallv unorganized,, or randont. 

• U^'^ng an- alphabetical list i^easy and- usually JEree^ of J)ias unless 
there are periodic features • iri the -list which coincide; with the sampling', 
interval. For example, if some ethnic names tend to group ^-themselves at 
apecifit points in an alphabetical list ,'' you/could* i;un the danger pf 
undersampling those/ grou'ps • ' • - • , , 

Although" this methpxi is ^popular and usedvwidely,^ using a tabie of ^ 
randpm jrtiunbers is a be'tter procedure;* \, 

. , ' " Excerpt from 

'A Table of Random Numbers ' ^ 



Row 
dumber. 



1 


50^91 


91|653 


. 88574. 


086/5 


2 ' 


19787 


66937 


.91769 


' 13399. 


\ 

4 


■16746 


77983 


18061 : 


23664 


91039 


. .16099 


38824 . 


00778 


5 


1J075 


62081 


.88977 


•78676 



One of the ways to use this* table is to. assign a numbej' to each member 
of the population. Then make two arbitrary decisions: • ' 

1. Decide to read the table either vertically or horizontally. 

' ' Select a starting po.^nt. 



*Adapted from Walker, Helen, M. and Li2vy,'J. 
•N.Y.: Holt, Rinehart and Winston, 1953. 



Statistical inference . 

■ ■ \ 



ERIC 



The starting point can.be anywhere (upper 15eft-han>d 'comer, third 
column line '4, lower right-rhar^jl corner, etc,). Suppose, for purposes of 
illustration, yo:.. .\.?.cide to star.t in the, upper/^lef t-hand corner and read , ' 
vertically, • juo^ose .^>lso that the populatic/h' in question ha^-350 members- 
- in it, Yoj.; v.-..,r. t;. select 75.person& in the sample.- The task Is to locate 
.the first .S'ljumberV that fall in .the range of l.tp 350. Stat^ting with, the-^ 
.f :.rst nnmbir, 50691, look ^t the first 3 digits..^ 506 is not fn .this range. 
Go to the next number. iSince 197 is in the range, studeot 197 isthe^first 
to be selected in the sample. • 

Check Yaur Random Sample -before Collecting Data ' ' 

If yeur sample is 'small (e.g. ,■ clas&room unit^) , it is a good practice to 
check the distribution of important "'group Characteristics before calie^cting 
clata. By cl\ance, a sample may bo drawn lOTiich over-' or under^represents some 
variable you wish to study. Far example, you may c^raw a classroom vith 20 
boys-and 5 girls; ^or you may 'draw \ sample witl^ut the ethnic representation* 
3{^u -wish to have. Some samplers advocate' that a second-or third- independent 
randoiii sampla be . drawn if, this ^should happen. ' \ * ' 



Learning Exerc; 



'LEARNING EXERCISE 8: RANDOM SAMPLING 



Ie^c. foil owing exercise for ra ndomly selectin g fi ve' (5) stu denEs.. for 



classroom observation; • , 

a. NuTitber each^name in . ti:e list of students below. 



(Start with LEFT column. ) 



Paul Adler 
John Allen 
Mary B rummer 
Ken Duman 
June Feng 

Scott- Goldsmith 

Ann Jamison 

Yoko Kimotc 

Cathy Labovitz 

Jerry Mann 

Carolyn Mendez. 



/ 



Tom O'Toole 
' Brian Peters ' 
Andrew Ramirez 
Margaret Smith 
^Sheri Thompson 
Carmen Thjarber 
Terry ting 
Phyllis^ Unwin • 
Rodney Woods 
Roy York ^ 



_a -Ramon Nunez 

Pat O'Conner 



b. Chp.ck (^) which way you will read the Table of Random Numbers on 

:. page Cr-30: . . • ' ' " V' 
Horizontally, (across) or vertically (up and down) ; ^ 

Check (^/) where you-will start reading th«j Table of ^Random Numbera: 

Left upper corner or right bottom corner _w) — ^_ — . 

d. Identify the number of digits necessary for selecting the sample from 
the list of st-udents given above:. , _• 



e. 



Learning Exercise 8 
C-29 



Using; the Table'of Random Numbers on pa'^e follow the instructions 

given below: ^ • 

1. Bead the first two digits in each five-digit number. 



* 2. Use the Tabl^ of Random Number^ and identify the names of the 

• students in the \ist on page C-28 that corrssp-ond to the numbers in 

th-e— t-abie:»-= ^ 



3. Using the method you choose to select the random numbers Csee b' and ^ 
c. on page C-28) , enter the students' names in the appropriate column 
below', o 0 



Students for Classroom Observations. 



Stiidentff 

To Be - .* * 
Observed 


— — 1 

• Method' 1: ^ 
iiorizontal 
Left 
Upper 


Method 2:. 
Ho^-izontal' 
Right 
Bottom 


Method 3: 
Vertical 
Left 
•Upper 


Method A: 
Vertical 
Right 
Bottom 











1. 










2. ' • 


— flr 

> 




a 

: 
















I' 

1 ^ - 








5. 


r. 









Learning Kxercise 8 



C-30 



Row " ' ' 

No. TABLE OF RANDOM NUMBERS (DIGITS) 



1 


50691 


91653 


88574 


08675 


12700 


32027 


41034 


56912 


34264 


77769 


2 


19787 


66937 


91769 


13399 


96096 


43165 


72096 


86350 


23062 


99419 


3 


r6 746 


7 79 83 


10861 


23664 


64557 


78213 


43857 


68Q09 


20483 


00618 


— 


91039 


ibuyy 


38824 


00778 


23058 


76539, 


50584 


71810 


52589 


32778 


5 


11075 


&2081 


88977 


781&76 


53855 




13090 


01708' 


89016 


45111 


6 


41230 


92934 


30 34 2 


29933 


24597 


72632 


11111 


63861 


80454 


hlUl> 


. • 






-6-4-7 7 5— 


- 59803 


45737. 


1 9025 


4f)_65i>: 


18914 


. 03062 . 


8 


. 42957 


25204 


007 53 


60284 


85483 


34984 


86637. 


95354 


80698 ■ 


98750 


9' 


" 45881 


'59475 


o4445 


98 261 


55252 


50788 


31295 


16437 


49497 


22493 


i 0 


75104 


45319 


88471 


f.5440 


55309 


63481 


23616 


64950 


73291 


10964 


1 1 


78614 




0 .7 D /, O 


84643 


10455 


95596 


38158 


75758 


656 28 


104#8 


12 


69278 


' 59274 


67 459 


33563 


98241 


18097 


65297 


/ ^1 o rv o 

49803^^ 


99145 


o c o o r\ 

25320. 


13 


58626 


91259 


13832. 


75095 


08333 


53845 


74223 


82690 


89320 


89565 


14 


81630 


00339 


07996 


65249 


66792 


05555^ 


79169 


12136 


44621 


95904 


15 


74330 


13688 


02044 


6591C 


96007 


32692 


40473 ^ 


.56437 


35671 


95072 


16 


708 29 


66963 


86 390 


26458 


02685 


41505 


06239 


68990 


p 

32915 , 


89542 


17 


55084 


58581 


60759 


■ 20627 


86682 




03648 


38183 


'29823 


68134 


■18 


^98845 


" 17428 


9 7 39.7 


62400 , 


51284 


92211 


40593*^ 


82713 


06067 


4619Q 


19 


48116 • 


' 91870 


16346 


97406 


54649 


42039 


58407 


84248 


45780 


60547 - 


20 


82776 


31709 


71564 


26258 


07522 
m 


038 23^' 


92087 


21809 


25678 


399 87 


21 • 


86615 


67 618 


07446 


63129 


07111 


70516 


67289 


09457 


48995 


08043 


22 


82558 


99260 


691 36 


'35099 


68187 


85382 


09569 


94211 


57824 


98100 


2? 


08290 


70291' 


74090 


96503 


56140 




^ 27765 


51740 


07712 


29816 


24 


95062 


76310 


81603 


8682^ 


68370 


.46001 


79205 


35511 . 


. 91239 


52961 


25 ' 


. 30361 


66712 


86801 


29556 


91232 


98295 

9 


87322 


99172 


50009 


27224 


26 


17 390 


^ f\ 1 

90 10/ 


/ u J y 1 


78715 


61943 


33315 


39 7 78 


97149 


08122 


86388 


27. 


05390 


3304'6 . 


63920 


28733. 


42644 


38972 


. 98161 


79861 


88282 


28279 


28 


06624 


21114 


33c:G9 


20940 


03732 


399/3 


89948 


81060 


36381 


■06027 


29 


58146 


77295 


33742 


00135 


2658 ■ 




94846 


18587 


39327 


71711 


30 


7b430 


28645 


62335 


60393 


71813 




09917 


89100 


93855 


75617 


31 


16664 


30164 


22546 


63538 


79376 


26865 


61995 


60418 


31111 


84170 


32 


56424 


64680 


81038 


79364 


23815 


44002 ' 


38380 


09864 


35950 


10760 


33 


95954 


■ lr)540 


18554 


63349 


70259 


03212 


91950 


16214 


80378 


56421 


34. 


59007 


'56364 


49965 


6197b 


32493 


55404 


85950 


9^9606 


46328 


17887 


35 


1934J 


87208 


99853 


40202 


C8553, 


78731 


83463 


19524 


8 2 50, 2 


1 3556 


36 


■ 24505 


870(^7 


35 7 48 


5486 5 


40209 


49466 


945 74 


31406 


6 4 4 22 


8/18 5 •• 


37 


15086 


92183 


84632 


36790 


•59608 


00371 


67456 


55361 


80669 


75402 *' 


JO 


0 .) D 


02188 


093 64 




L. .} CJ 


Z. H -J *t 


JO o jv 


1 n4S4 


± J, L L. 




39 


401^97 


76835 


1406 2 


. 96067 


70645 


23695 


59140 


'l3iSl2 


18804 


55529- 


40 


31700 


74753 


22919 


43207 


83387 


27820 


12494 


30041 


B8927 


22668.' 


41 


14472 


19372' 


237 59 


471.16 


81647 


44946 


9771^ 


41157 , 


. 30^^13 


30842' 


42 


18018 


57 089 


98428 


89075 


77511 


1519 4 


69634" 


68269 


52292 


63404 


43 


16752 


54 266 


76103 


05268 


41145 


36100 


73916 * 


32462 


016 58 


68565 


44 


47184 


33660 


96555 


56656 


18238 


56888 


29315 


99813 


47831 


81395 


45 


93884 


63945 


06.606 


45545 


29237 


21€r40 


43552 ' 


02749* 


19963 


23705 
> 



er|c 1^^^ 



Learning Exercise 8 
C--31 



ANSWi^S 



I 



Students To 
Be Observed 


Method 1 


Method 2 


Method' 3^ 


Wiethod 4 


1 


Y. Kimoto 


^— 

R.cYork 


C. Thurber 


R. Y«ork 


2 


R. Nunez 


C. ^Thurber 


A. Ramirez 


R. V/oods * 


.3 


, C. Thurber 


J. Allen 


• C. Mendez 


P. O'Conner - 


4 •■ 


> 5, 0' Conner 


' p. Unwin 


Y. Kimoto 


H. Smith 


5 


R. York 


' S. Goldsmith 


M. -Smith 


J. Mann 

e 



Learning Exercise 8 



- C-32 



ERIC 



Stratified Random Sampling 

Another way to handle^ the problem is to draw a stratified random sample in 



which tho population is fiibL divided I n ro categories or strata and then 
, random' samples are selected for each category or stratum.^ The .more of these 
categories you include, the less you have to depend on randomization to 
handle the extraneous or uncontrolled factors, for the units within a 



sampled stratum will be i^'.iikt-. cn the category selected for stratifying. 
Here are two examples: - ^ . 

A, Here is a population of 7th, 8th, and 9th 
grade boys and' girls given one or two 

periods of reading instruction per day, " • v 



PGPULATtON 



B. We might begin stratifying the population' 
^, by choosing the factor of SEX. - 



BOYS 



GIRLS 



C. 



We .might also choose the factor 'of AMOUNT OF 
READING INSTRUCTION. Divide the population 
again into the two levels of amount of reading 
, instruction — one period and two periods per day. 





ONE PERIOD 


TWO^ PERIODS 




Boys with one 


Boys with two 


BOYS 


reading period 


reading periods 




pe/ d.vv'. 


per day 




Girls with one 


Girls with two 


GIRLS 


reading period 


reading periods 




per day 


per'day 


« 

» o 






t 






d 


si 

<• 





c-33: 



D. Finally; we might wanf each grade adequately 
repr^esfented. Divide the population- again 
into three levels by grada-^Tt'h 8th, 9th. 







GRADE 


7 th 


8th 


9 


th " 




AMOUNT pr 

rj::adinc 




Tva Periods 


One Period 


IVo Periods 


One ^Period 


Two Periods 




■■■3 ■■ ' 
















X 
id 
in 


BOYS 


7th grndo 
bc>ys with^ 
Cire peri bti 
^ cf roadi ng 
per clay 


7i\\ v;rade 
hovf. with 
tv; J 'po r i ods ■ 
oi read Lnr, 
per day 
^ 


8th grade 
boys with 
one periC'il 
of re3dir>g 
per ' day 


i<th v>.rade 
boy 5, wi th 
tv;o period?? 
of • rcadi ng 
per day 


Vtn ■ grade 
\ hoys wilh 
one period 
of reading 
per dr-y 


9tU )',rade 
boys vitii 
two peri ode 
oi reading 
per d-iy 

1 




' Gini.s 


,^^..,^...,..1 

7tlv grauc 
girl-- vjith 
o'AC period 
of reading 
per day 


7th gride 
girls with 
two periods 
of ^-reading 
^ per day 


«th grade 
girls v;ith 
L>nc period ~ 
of reading 
per day 


— ^^..1.^. 

Hrh. grade 
girls with 
two periods 
of reading 
per day 


9 til' grade 
•* gi rJ r. v;itii 
one period 
■ of reading 
per day 


9th grade 
girli v.iiVi . 
two period?} 
of reading 
pCif day 





The. strata or 9*tbgroups f roiTi which we are 
. ..sanpiing are clearly becoming m9re'' and more 
homogeneous. Our originally fairly hetero- 
geneous population with its characteristics 
of 7th, 8th, and 9th. grade boys and girls 
with one oi two periods of reading per day 
has become . .12 smaller, m9re homogeneous 
subpopulations. Random samples from^these 
smaller, relatively more homogeneous groups 
yield more representative samples, although 
fewer 'students are drawn into them: 

A. If, you randomly sample 48 items from a 
population of test items which contains 
within it si:c subtests,. 



SAMPLE/- A 8 
ITEMS FROM 
RNTLRH TEST 



then the, random sample may include more items 
from one/subtest than aoother by chance. 



C-34 



However/ if 'you divide- the population of items 
into subtests first and randomly sample it^s 
within subtests equally, 



> 


SA^I?LE ^ . 
ITEMS FRDM 
SUBTEST 1 


SAMPLE ^6 . 
ITEMS FROM 
SUBTEST 2 


SAMPLE 6 
ITEMS FROM 
SUBTEST 3 


> 

1 




. ITEMS FROM 
SUBTEST 5 


"''■SA>ffLE~6~^ 
ITEMS FROM- 
SUBTEST 6 


™— — ■ — 

..V 


"^-SAMFUE-'e 
ITEMS FROM 
SUBTEST 4 



then your sample includes an equal number oJE 
items from each subtest jand you c^an discuss 
the results more def i'nitively in'terms .of " 
subtests and the tes^: as a whole. You could 
also, select a variable^ number of items p,er - 
subtest depending upoh^ where you want to put 
the emphasis or accoi;ding to the proportionate 
allocation of items in the original test. The 
important thing is that you have more.^cont rol 
over .composition ob the final sample. 



^• Matrix and Multistafi^ Sampling " ^ * 

Another kind of sampling closely related to the, stratified random type^ iis 
"' matrix' sampling . In this instanc^e, the instruments or items are sampled. 
If data are needed on a large number of objectives, for example, rather than 
subjecting orre sample 'of students - to a lengthy test^. or series of tests, the 
evaluator administers samples c;f the test items or tests to (different 
samples of the population. . . " ' - , 

Multistaj^e^ or cluster sampling is a technique of random sampling ^fhat 
is frequently usedr The most used method in surveys -is the successive random 
sampling of units (or groups and subgroups). For example, in a statewide, 
evaluation, the evaluator first randomly selects distr.icts, then schools ^ 
within district s ,; then classrooms within schoals. This amounts to a narrow- 
Ing dow.n, In staples, of the sample with a 'randomness procedure at each sta^e. 



ERIC 



C-35 



■ ■ , 9'. HUU LMGE SHOULD A SAMPLE BE? [ ^ 

S ampling within .a schoo^ - or di.ocrict' f o r i^ror.mn ovaluntion r"^r^'-^ ^- ' . 
pjracticai except for the larger schools and districts. However, sampling 
of parent or comminity groups is practical for all except the smallest 'of 
"conmunities. The size of the population and..the amount of error the ^ 

- eva luai^ox-:is-villlag tO-LiJ IiLi:illLe_i s wh a JL, dgJLe rmi n e s th e _2llLg.ti eg j-jt y :^ 

using a sampl-e.' "Population" in th is./ sense means the group for whom you 
■want information—it may be all fifth graders or all parents .of secondary., 
school. students enrolled in noncoll^ge preparatory curricula or all adults . 
in the community bfVoting age. - . ... ■ 

VAieriever a sample is used, the ineyilable " question which-must be, faced- 
iSr'.invat would the results have been, if ' everyone in. the population' had been, 
included?.; The; sample, if approp riat^ely drawn, gives an' est ima. te of what the 
results ?;ould hav.e been for everyone in the •'population had it bpen .possible 
to include everyone. However, there is always presumed- to be some errors . 
in thls-estimate. The ovaluator does not know how much error there is but" 
he/6r sha can control the expected amount of error by selecting a sampld 
/^roportlpnate to the number of persons in the total populat ion . "'Suppose , ■ 
' for exampl-t, you ask a sample o£ parents if they approve some organizational 

change the board of education is considering. The change is a fat.rly major 
'■ one, and. you want to b«,- sure that the proportion of sampled parents^who 
approve comes with i;i 5 pc-rtentap^e points (with 90 percent certainty ) of 
what the roaiilts w«uld have betn^'if all parents had. been asked. If .there 
are 2,000 pare;:t5, you would need to. obtain ..322 ,, responses to achieve this 
•degree of accuracy. Jlie table that f ollows''was developed using this degree 
of, accuracy and shows sample sizes for various populnt ion 'sizes. 



■■ *SampUng cannot guarantee certain variation 100 percent of the t:ir;u-, but 
I it ' is possible t;o know liow sure you can be of your re.sultt;.^ 



ERIC 



c-36 



TABCL FOR D CTKPaM I NINc" SAMPLE-. SIZE FROM A GIVEN .POPULATION^' 



10. 


10 


220 


iw . 


: 1200 ■ 


■ 291 ■ 


15" 


-l A • 


2 30 


lA*^ 


1300 


2.97 


" -y - • ^ 


- • — -19 - - 


—2-40 


■ 1-48 - 


14.00 :.„ 


. -.3X}2_ 


. • ■ . 25 


24 ■ 


.250 


152 


1500 


/ 306 


.30 




260 


155 


1600 


310 


■ 35 , 


"32 


' 270 "■. 


159 


. 1700 


313 


.40 


■ 36 


280 ^ 


.162 


1800 


317/ 


- ■ :. A5 


40 r 


. 290 ' 


.165 


' 1900 


320 / 


■ ^^50 ' 


4A 


300 


169 


. 2000 


322 ' 


•• 55 


. ■ 48 .. 


320 


175 


" 2200 


327 ^ 


• - " . 60 


■52' 


340 


181 


2A00 


331 ' 


' - 65 


r: 

_ J 


360 


186 


2600 


335 


7 - . ■ " . 70 


59 


r»80 


191 


2800 .- : 


338 


. 75 


63 


400 


19 6 


3000 


. 3A1 ^ 


■ " BO 


66 


■4-20 


201 


3500 


3A6 


.85 '■ - 


70 - 


■ ' 440 


205 ■ ' 


AOOO 


351 


■■ • ■ . ^90 


73 ■ 


460 


^ 210 


A500 . 


35A 


95 


' 76 


A80 


21A ■ 


5000 


25 7 


■ ■ 100 . 


80 


' 500 


217 


" 6000 


3*61 


. . 110 


86 


550 ■ 


226- 


'7000 


36 A 


. / 120 


. 92 


600 


2 34 


800O 


36 7 . 


■•• -.130 ■ . 


97 


650 


2A2 


9000 V 


368 \ 




103 


700 


2'A8 


10060 > - 


370 : 




108 




25A 


15000 


375 


lf>0 


113 


800 


■ 260 


20000 


377 . . 


170 


1L8 


850 


265 


30000 


379 ' 


180 


423 


900 


269 


40000 


380 


190 


127 


950 


2>7A 


50000 


381 


200 


132 ' 


1000 


278 


75000 


382 -/ . ' 


210 ' , 


136 


\ ilOO 


285 


1000000 


384 



NUTK: , N is population sixe. 
Sis sample size. 



Krejcic, Robart V. and MarKon, DarylcW. .Determining sample size lor 
research activities. P^ducat ional and Ps ycholog ical Measurement j Vol . 30, 
19 70, 607-610. ' " ' \ "~' . . ■ . 

' • • \ ' , '•■ 



ERIC 



C-37 



•Clearly, sampling within classrooms is not appropriate for program 
evaluation purposes. However , sampling on small populations (such as a 



classroom)^may be used for other " purposes. Exploratory or .pilot studies may 
give indications or hunches which can then be studied more thoroughly with 
the larger. groups. Groups' of between 10 and 30 can be used advantageously 
•for such purposes and are easier to handle computationally. " , 



lO-.- TV;0 CASE STUDIES ON^SAMPLING 
Number 1 . ; • . 

' m 

The English teachers at a high school had been receiving considerable 
exit icistn f ro^rir." ^ome members of* the community whose sons and daughters had 
not performed welL- on the Scholastic Achievement Tests. ' The English teach- 
•ers did not want their^ program judged on this single criterion and therefore 
wanted to gather and publish information about achievement on the full 
spectrumof objectives in the English program. They developed a plan and 
identifies and developed a series of ins truments to assess objectives in - 
five different areas: . V. ■ c ^ 

• Reading — A standardized test will be used with ' - ' 

scoring by tl^e publisher. 

• spelling — Teachers will read a list of commonly 

misspelled words to students who will; • . ■ 

write the words oa a piece of paper. 

• 'Familiarity with literaLuxe — A multiple-choice 

' ' - test developed by ' the English depart- 

- mcnt faculty will bo' given by the ^ 

' teachers . • - ' 

/Appi;eciat iii)n oC Litiirature — Students will be 
' interviewed by someone othor than 
^ Llieir own teacher. • ^ 



/'^i;;;;^ Wr i f ing — -"^V^nchers will Jtvaluate a paragraph 
w r J e n by e h c h t u d e n t . 



I. 



C-38 



Having done this planning, the* teachers realized that the evaluation 
process would consume enormous amounts of teacher and stud*ent time and would. 



cost considerable money unless some economies were possible. They decided 

to explore the poss ih i lit ics of using sampling procedures.. 

' ' .1 

As an educator recently, sensitized to some of the advantages and dis-^ 
advantages of sampling procedures, what advice would you offer? For which 
objectives \/ould you sample and for which .would you use the entire student 



];ody? Why? 

The following suggestions may be similar to some of your considerations: 

• Reading The cos :s. of administering and scoring these 

tests are not great. The |ntormation may be 
' ). needed for student educational guidance and 
placement anyway. Therefore, it is probably 
appropriate, to Lest all studenttr. at . least at 
^ entrance and before they leave scfibo'l. 

Students wi th def icienc ies might be tested 
at intervals. Moreover, all pa^trttSv are 
^ ^ probably concerned about their own children's .. 

achievement on this measure. 

• Spelling — The situation • is analogous to the reading 

objective, and the same recommendation would 
seem appropriate. 

• Kami i iarity with literature — This test irf fairly easy- 

to administer and score. llcv/ever, it may not • \V:*' 
be equally important for all students. Nor 
.will all stuJents receive equal exposure. - . 
Sampling of different groups of students based" 
on courses , taken- or tentative plans after high 
scliool wc^uld seem sensible. • 

• Approciarion of literature — Since an interview is planned, 

^ the time req[uired will be considerable. Train- 

- . ing will bo necef.;sary if the interviewer is ;• j 

' to avoid biasing the responses of the person 



EKLC 



1-^ 



■ c-3? 

* ■ . I 

incerviewed. Since the interviewer is not to 
*** • ' be the student's 'owri teacher, arranging for 
&an€ajaa to.dg the interviewing may be difficulr. 

or costly. Sampling would seem very- appropriate-..' ' - 

■ • ' 

• xWriting' — Sampling would provide convenience. Students - . 

• are undoubtedly writing paragraphs for somCo of 

their tegular assignments. These regular assi 

"•^"^ '"^ments' could be the*source o£ the sample, provided 

the teachers could agree on a cominort assignment 

..... for this purpose. To avoid bias, each teacher 

f " might read and xjriticize the first paragraph of 

a regular assignment turned in to another teacher. 

" , In this case,"a.il students might be assessed with • 

sampling to avoid a great amount of testing and •> 

evaluating concentrated at one or -^wo times of the 

year. Sampling of regular assignments might also 

encourage consiistency in quality of writing as 

opposed to a one-time effort. ^ . 

Sampling is probably used less frequently than it could be for program 
evaluation. It should be remembered that sampling procedures can be applied 
not- only to student.s but also to test items, time of day, teacher perfor^ 
TTiance, textbook content, and so on. Sampling fits any of the many larger 
populations about which we may want information.' We can often get a.s much 
information as we. neefl by applying our measurement to a relatively small 
• part of the whole. - ■ 

< ■ * • . . - 

' Number- 2 

The preceding discussions tuive -emphasized tlie use...oi samples and especially 
of random samples. The election of '1936 provided one of the classic cases 
of nonrandom sampling and its consequence es . The Literary T>i^>est magazine 
had correctly predicted several preceding elect itjns and used its tried and 
testpd technique in 1935 when ballotj} were mailed and 2,300,000 rcliurns were 
received. Based on analysis of those daDa, the Literary D iges t: ussu red 



ERJC . 



Republican Alfred Landon that he would delEeat Franklin Roosevelt by 241 
electoral votes to 99. The election did not work out tjiat way, and the 
case illustrates two iir,portant points: 

1. The Literary Digest used a sample of 2,300,000." . 
~ ~ : ~~ I'h^xece^ years, Garfup pulls have used 2,000 — . 

4,000 but ' have obtained more accurate results. 

2. The Literary Digest drew i ts sa m ple from lists ^ 2: 

. of people SMch as telephone directories. But' . - 

at that time, not all people had telephones. 
In particular, iDw-income people were less ^ 
likely' to have telephones, especially during 
the depression. In the nineteen-twenties , 
income level had not been "a significant pre- 
dictor' of political preference as it was in 
1936. The D igest had drawn conclusions about 
the voting population from a sample which was. 
not' representative of the population.* Had 
the sample of voters been random, each in.come 
group would have shown, up in the sample pro- 
port.ionately to its share of the population. 
The same would have been true for geographic 
regions, race, age, and any number of other 
factors that might l)av'e influenced voting 
patterns. 



^•11. A FINAL WORD UN DESIGIi AND SAJ-1PLINC . ^ . 

It is cl?ar that good evaluation design produces information that is valuable 
to schoo.ls. It is aLso clear that the cost- of translating such design into a 
working evaluation of a program is frequently high. Mistakes must be kept to 
a r. i in* irium Thus , it is essen t ia 1 to a pp roach tin:* design- ■ of an evaiua t ion 
from t\\e praj;r;iatic as well as tlie theoretical point of. view. 



C-41 



Here are som^ practical considerations to keep in mind. when approaching 
any program evaluation design:- -> ; 

1. You need to keep records.. In order to compare present programs 
with past prc^.rams and to gaug^ progress,, you must have adequate 

r 

information. You may need the cooperation of /other schools. 

2. Use the same or comparable instruments in different times and 

: p'la rpc;.- Tf ynu k pp p cha ngin g instruments, you can never make ; 



comparisons ov^r time. Fortunately, sampling procedures * make 
it possible to continue using the same • ins t rumen ts .while also 
adopting new ones-. . 

'3. "". Resist the' natural impulse to treat all students alike if you 

want to assess the ciffects of different programs. ^ / ^ ' 

4. .You need to be able to bold pro^^^rams still- long enough to loo^^ 
- ^ . at them. This means that, innovation cannot constant, but 

• should' progress in planned and measured increments of improve- ^ 
: ment cind change.. ^ . -"^ ' 

5. You need to plan much farther in advance and include planning 
* . ' . • 

- for evaluation' as pa-rt of pro^.ram ^^lanning. - ^ 

** r • * • • 

6. You need to communicate clearly to students, parents , and 

teachers why you are'cloing what you are doing. 

'In this section, we have also indicated ways in which evaluation 
cJpsif^ns can be applieci more economi<:ally through the use of sampling 
tf'Chnlcjues^. Sampling' is particularly suited to program evaluation which 
^ is based on -information about groups, not individuals. Sampling exposes 
only a portion uC the program population to evaluation procedures; if done 
^-^^roperly , -however, the iriformat ion produced oo a relatively small segment 
of the population will be comparable to what might' have been produced 
the entire population. ./ 



on 



ERLC 



PROGRAM EVALUATOR'S GUIDE 



Section D 



SELECT OR DEVELOP ASSESSMENT - IliSTRUMENTS 



7 



4 



The Evaluation Improvement Progi'ani 



ERIC 



• ' PRECIS , ^ ' - \ 

■* . * 

For ;the most p'art, -the program -evaluator will prefer to select assessment 
instruments from those already avail'able, though under some circumstances, 
iacal development ■ of specially devised instrumornts will be more appropriate. 
Many types of instruments are available; which type to select depends, of 
course, on the^ object ives of the program,. • 

An Instrument* should be selected for each . discre te learning outcome that 
is expected .to result from the program. Skills^ abilities, knowledge, and 
undferstanding are outcomes best measured by tests. Att-itudes, feelings, and 
ap^:) radiations ^r^ more appropriately measured b^ questionnaires and structured 

interviews. S^jtiaviors, interactions, and p.rac tices may be more satisfac-t- 

i ^ * . . 

ori'ly assessed by means of observation instruments. ^High-priority objectives _ 

will require mnltiiole measuresV 

«* Carefxil selection and development of instruments for program esvaluation - 
help assure that all the information needed to judge how effectively 'object ives 
are met will be available wh'en the data-collection effort is complete. 



CONTENTS 



INT.RODUCTION . . , • • • . . . • . . . O-'l 

Scenario ^1 ^ ^ 

Scenario 2 ^ ^""^ 

Scenarix) 3 . . . ? ^-^""^ 

OVhVRVIEW OF THE EVALUATION MODEL D-ii 

- t 

CONSIDERATIONS IN"^ SELECT I N^G ASSESSMENT INSTRUTIENTS . . D--ii 

Standardization •. D-i3 

Reliability and. Vaiidi ty • . ^-14 

LEARTUNC EXERCISE 9: . RELIABILITY AND VALIDm ; • D-17 

TYPES OF ASSESSMENT INSTRLTMENTS . • 

. Achievement Tests D-21 

• ' - . . '' • ^ ' . D-27' 

Questionnaires ^j^, 

> LEARNING EXERCISE 10; JUDGING ITEMS' . . 'D-AS 

" Observat Lon^il Techniques and Instruments • D-48 ^ 

LE.\FNINC EXERCISE 11: CRITICIZING A CLASSROOM 

OBSERVATION INSTRUMENT . . . I . ^ * * ^ . ^ f '^""^^ 

Other •Behav/iors- * . . D-() i 

\ ■ ^ 

SOURCES OF INFORILMION ABOUT INSTRUMENTS ■ . . . i ■ - - • "^-bb 

■ Usa Multiple Mtidsures v«nieneveF Possible . 0-68 

LEARNIMG UX: ISE* 12; SELECTI?;G NORI-I-REFERENCED TESTS ..... .- . . --0-69 

LEARNING EXERCISE ' 1 3 : SELECTING APPROPRIATE IN5TRU.MENTS ....... D-74 

^LOCATING- EXISTING INSTRUMENTS vs. DEVELOPING. ASSESSMENT 

INSTRUMENTS LOCALLY .\ ^^-76 

Developing Instruments ' D-7b 

" ■ D-79 
REVIEV.' -r . . . . ^ . 



1 2 i • 



U INTRODUCTION 

N*ow that you have seen tlie kinds of questions that must be addressed in thu 
planning stages of a program evaluation, we are goi ig to take time, through a 
set" of three scenarios, to show what more typically "happens in school districts. 



List of Characters 
(in order of appearaiice) 



Personality Type 



Mrs, Smith 



Classroom teacher,- disgruntled with 
evaluation report on her classroom 



' Pr inc ipa 1 



Sympathetic to Mrs- Smith but 
painfully trying to meet state 
reporting requi rement s 



3, .Chairman of the Boar'd 
of r.ducat icn 



Responsible member of the community 

i , 

trying to look out for the school ,s, 
and the taxpaye r 's . iTiteres t 



• 4.^ Mr, Wortl\ 



The evaluator, who is trying to do 
the best job he can within the con- 
straints imp osed 



Pare.nt 



Mother of two children in school — for 
the program 



6. Parent- 2 



Father -of children in sciiool — but 
wants to know he's getting his 
money's worth 



Koy 



An eighth grader who thinks the 
program is "far out" 



Mr. ' ¥n i rcl; i Id 



A district supervi sor^ of evaJaatlon 
wi\o audits evaluation plans 



Setting: 



Mr.^. Smith: 



■ ■ - ■ ■ 1 

Principal: | 

\ . li 
i 

r 



nario I: 'Tlie Ropurt: Useful to Wlibm and for What? 

* 

(A. principa I' s office. You are th'e evaluator of a. 
reading program waiting to see the principal of the school 
who is busy talking with Mrs. Smith. You know Mrs. .Smith. 
She is a reading teacher and you. have been in 'her class- 
rood once or twice. Y'^ooo impression is that she is a 
very, outspoken type of pers'on and that she often talks a 
great deal. The door o'f. the principal's office is not 
c losed , and you he^"* r the following con vers at ion . ) 

Did you read that dumb evaluation report? That evaluator 
doesn't know what he's talking about*. He's only been in 
my classroom twice during the whole year and then only for 
tv;enty minuses 1 Yet he concluded NO SIGNIFIGANT GAINS. 
I'm a teacher.- I don't know .about all this evaluation . 
stuff. All I know is my kids and the terrific progress 
that sOrne of them are. making, the whole class has improved 
in readingl They enjoy reading in a v^ay you wouldn' t. 
believe! I'm proud of them, and I ^ertainly won't let my 
kids be put down by .so.ne fancy evaluatot* ^ 

"Mrs. .Smith, we* all know that evaluation reports don't show 
what's really going' on in th,e, classroom. * They're not 
supposed to. These reports are only for the oen-tral 
office and the capital. They make us send a- report. .1 
don't think anybody takes the time to read' them. They 
, certainly don't affe^^t my leelings about our program. 

That may bo so, . But I think people in the capital have 
the right to know about the good things that are happening 
here. .You don't need to be a prof essional .evaluator to 



figure out by yourself that all the students are different, 





He used just one test, Idoked at some one tWng that 

•he called the MEAN and decided that the program was nO' 

good. That evaiuator certainly is MEAN , not to mention 

unfair and overpaid. Many of. these tests aren't even . 

related to what we\i:e teaching in Qur reading program. 

It's obvious that the individualizeu Ins truction program 

that we started with such a big effort is helping students. 

Even the parents noticed the improvement in their kids. 

' We had an individualized program just like we talked about 

in those workshops. Incident;ally , those were really good 

** > . 

workshops and there isn't a wo.rd in that evaluation report 

■* ' ' ■ ■ - •' ■ ■ ' 

about those either. " }' 

Scenario ^2: The Program: Payoff or- Ripoff ? 

(A school auditorium- You are \n the audience along with 

many. interested parents at a meeting of the Board of 

Education. On the stage, seated at a t^able,- are members 

of the board, the principal, the program planner/evaluator, 

and a teacher who participated in the program. The main 

item on the agenda is the experimental [demons tratf on] 

reading program. The Board is meeting to collect' facts 

pertaining to the impact of the program. On the basis of. 

this infotmatioh, the decision will be made as to whether 

or«not to continue the program.' The meeting has been 

-/ , ■■ ■ 

under way for a short time.) 

... so we've 'called this meeting as part of our respdn- ^ 
sibility to the community to see that its school tax 
dollar is getting the best return for the investment. 

I have the report that Mr. Worth, the evaluator j has 
prepared. Mr. Worth,' let me start by asking you a broad,- 
general question. Given the fact that ycu find "no.: signif i- 
I cant gain," do you feel that; there is any justification for 
carrying the program for a second year? " , 



That's not a simple queg^tion- to answer. You're quite" 
right in pointing but that we found no significant gain in 
reading achiitvement. Howe;ver,,. on page 37 of the rep.ort, I " 
presented my criticism of the evaluation. Let me repeat 
now y f or the sake of t«hbse who may not have seen the 
report, that , my total evaluation budget was $2,000. 
Furthermore, no consideration^was given to an evaluation 
until school opened. In this situation, we tried pimply to ' 
meet the mihimum district and state requirements. Whaf. we 
did was administer a pretest! in Siept^mber and another' form 
of the same test in June. It was a standardized reading 
achievement ' test and it^didn't really refJLect all parts . of 
the experimental program. If I had it to do over, properly, 
you can be sure that this evaluation would look quite 
different. For example, you'd have data on how the kids 
feel about the program. 

Mr. Chairman, I was quite^ disappointed in the -evaluation 
jeport. I was tellfng Mr. Jack-spjri, our principal, not too . 
long ago how misleading I thought it was. Do you now that 
some students were so excited about the program^ that th^ey'd 
come <^arly in the morning and often stay after school just . 
to work out extra assignments? • " . .' 

Thank you, Mrs Smith. We-.-made this an open meeting because 
we felt that interested members of the community ought to 
be heard. "Does anyone in the audience wish to ask a 
■question or make a comment? Please speak up and start by 
stating your name. . . Yes, madam. 

My name'is Mary Thatcher. Two of my cThildren arie in 
Gardenview Elementary School. Tommy's going into grade '7 
this year. We've heard a lot about the' new reading program. 
Hy husband and I know several parents with chiidren in the 



Parent 1: 
cont'd 



program. They're extremely pleased with the progress of 
their children. Tomoiy could sure use individualized 
instruction — he's a good boy, but he doesn't read too well, 
and this program sure sounds like just what he needs. 



Chairman: 



Thank you, Mrs. Thatcher. . . Your name, sir?- 



Parent 2: 



I'm Farley Grant. We've lived over on -Oak Street i paying 
property taxes, for well on 10 years now. Now that we _ 
have school-age children, we want them to' learn something. 
I know that. you can't get by these days^ithotit knowing 
how to read, and I want my kids to get the best possible 
start. But I want to know that ray tax money isn't being 
wasted on some fad or other. I learned to reaci without 
all these fancy frills. • , 



Chairman 



Mr. Gtant, we appreciate your views. That's why we're 
^having this hearing. . . Yes, son, tell us- your name. 



Boy: 



My' name's Jerry Bilford. I'm in eighth grade and I just want 
td say th^t the program is really far cut. I mean^ I used r 
to hate reading;- but now it's really got me going and I 
think you should let it go on. 



Chairman: 



Well, ladies, and gentlemen, I must confess that we're 
really in a dilemma. I should let you know that ever 
since word got out that the district was considering 
dr'opping this, program, my office has received quite a few 
letters and phone calls' urging the board to 'keep the 
program alive. Hearing from you today, I get the same 
feeling. But, frankly, there's no hard evidence to say 
the the program's worth the investment. 

Mr. Worth, this Board owes you an apology. Your 
counsel about the importance of an adequate evaluation 
fell on deaf ears last summer. It's been a costly 
lesson for all concerned. 



D-6 



Chairman: 
(cont'd) 



The members of the Board would like seme more time to 
digest the information we've gathered today. W^'ll pxoba- 
bly continue the program. Mr. Worth, /we'd like to ask you 
to draw. up the plans for what you consider an adequate 
evaluation, along with a proposed budget, for considieratioh 
by the district office. .. " 



^ Setting: 



Scenario 3: Criticizing the Evaluation Design 

(Mr. Worth has sought out his colleague and most respected 
critic, Mr. Fairchild, to discuss the task the School 
Board chairman has given him. Mr . Fairchild is an auditor. 
His role is ^to criticize evaluation plans to ensure that 
they provide adequate inf ormation abbiit. whether or not a ; 
program is meeting its objectives.) 



Mr, Fairchild: 



Tom, I understand' that the. School Board chairman apolo- 
gized to you ill public for criticizing your evaluation of 
the reading program.. ■ ' 



Mr. Worth:.' 



Yes. he did, Steve, and I don't mind telling you that I 
felt relieved to have him off my back. 



Mr. Fairchild: 



I don't blame you. After all, if the man knew something 
about statistics, he could really have embarrassed you 
(laughing good-naturedly). Tom,, conf.ess now, you didn't 
really give this evaluation much thought, did, you? 



Mr. Worth: 



Just between you and me, I was so damned, mad at their 
attitude toward evaluation*, I wasn't really enthusiastic. 
They don't look at the data '.anyway , they, just do what'a 
popular. 



EKLC 



i s )^ 



Mr. Fairchild: Could you give some examples of what you would have done 

• differently, even within the limits which were imp osfed? 

Mr. Worth: Well, for starters, I could have argued more forcefully 

against the need to. test everjT student . With the money 
■ saved on the cost of standardized tests, I could have 

afforded to admj.nister a criterion-referenced test, to have 
interviewed some students, and to have conducted .a more 
, sophisticated data analysis. I also would have fought to 

delay pretesting by one week. That would have given us 
enough time to plan a stratified random sample of students. 
You know, Mrs. Smith was right. The kids who really needed 
individualization dld^ benefit from the program. If we 
could have selected two saiqjles of/ student^— those identi- 
fied by their teacher as those who needed individualization 
and those who didn't~I bet we could have *shbwn significant 
^ ^' gains for- the first group. 



Mr. Fairchild: M^ybe so. In that case, ypu would have evidence of a 

relationship between student characteristics and special 
1' .. treatment — a really interesting and useful finding.- But 

.1 - let's. tuiru to your present task! What kind of proposal are 

you going to make for «thi5 year's evaluation? - 

Mr. Worth: ' It'^ going t'b be based on four points: 

l/ Clear specifications of program objectives in terras of 
achievement gains, attitude changes, »aud changes in 
incidental behaviors on the part of the participants 

2. MultipU measures, for each objective, using a wide 
> . ^ range of instruraents \ ' 



Mr. Worth: 3. St-ratJ.f-ied random sampling, and inclusion of a variety 

(cont'd) 

of student factors, ins'trucvional factors, and environ- 
mental ^factors in the* evaluation design 

4. Use of, sensitive statistical tests of significance' 

Mr. Fairchild: Wellj Tom, you've shown me once again what a skilled jbb of 

conceptualizing you're capable of. But you're an evaluator 
> not a statistician; you^d be the first to admit that. How • 

do you propose to handle the data analysis? 
* . . ' « 

Mr. Worth: ^ You're right, Steve. One of the things I want to budget for 

is the service of a conqjetent statistician so we can 
get the most out of the data. 

Mr. Fairchild: Let's do some reality testing now. Are^ you really going to 

be. able to carry out this plan with your limited time and 
limited staff? Perhaps you'd better give some thought to. 
priorities and prepare some alternatives in case you're 
not given more released time to devot.e to the evalution. 

Mr. Worth: Yes, that makes good sense. I'll work up several possible 

* . ■ A* ■ ' 

plans—what I consider the ideail evaluation — 'and two or * 

three alternatives which meet minimal criteria, and a cost 

estiuiate for each. 

Mr« Fairchild: Perhapg you shoiis^ also prepare and submit a list of those 

parts of the evaluation you consider essential to the 
program planner. Thet^ you and can meet and 'come to an 
agreement on an evaluation plavi that addresses his needs as 
. . well as yours. ^ < ^\ ' • 



Points Illustrated in the Scenarios 



to be thorough and to be credible, evaluation shbuld encompass the processes 
as well as the outcomes of a program, " . ■ 

Insufuments used should be mtched carefully to specific program objectives. 
More than one. instrument should be used for each program objective. 

■ . ■ ■ • ■ ^ 

Evaluation shculd include as many of • the program object ives as possible, 
hot just one. 

The' people. in the program should be consulted and involved in planning and 
conducting the evaluation. 

If evaluation is to be respected, it must provide information, useful to 
people in the program. . - 

Adequate rime and money should be provided for the type and "amount of 
evaluation nee'ded. 



Multiple instruments should be used to measure all feasible program goals. 

Technical-procedures should be used to facilitate economical and more 

thorough evaluation. 



All evaluators need* outside assistance occasionally and should seek it when 



appropriate. 



Evalu-ation plans should be^ designed ift accordance, with the time and resources 
available. . " - , • 



Summarizing the ' Scenarios 

We've followed Mr, Worth as'he overheard a conversation between the principal 
f-nd Mrs, Smith who was complaining about the ]uality of the evaluation. 

We've attended a meietihg of the School Board in which it was made plear ■ 
that while the community, "teachers , and School Board are convinced that 
the program has some merit, it was evident that the evaluation report did no|:" 
support that proposition, . > . 

And we've overheard .a- conversation, between Mr , ,Wor ^ . \nd his colleague^. 
Mr, Fai^rchild, in which Mr, Worth has sketched an evaluatioti plan which 
will provide the kind of information that was absent in the .previous year's' 
evaluation," . . 

We' have used these scenarios for two major purposes:. \^ ■ '* 

• Our di^scussion has established some general principles about evalua- 
tion, . „ ' \ . * ' 

• The. reading program discussed in the scenarios is . the -^sett^ng for more 
discussions of evaluation skills in the . subsequent parts of\ this 
Guide,- ' • ' - ; ■ ,. . ■ \ " • ' 



D-ll 



' • ' -v^ •• 2. -OVESkVIEW OF 'THE EVAIUATION MODEL • . 

To place the selection of instr-ments in proper perspective, it may help to^. 
'review again the elements that go into program evaluation., ' , 



Elements in Program Evaluation 

• Purpose and Requirements 
Plan and Procedures 

• Evaluation Design 

• Assessment Instruments ,\ 

• Data Collection- 

• .Data Analysis ^ ■ ' 
Preparation and Interpretation 
of Reports 

• Application of Findings 



..V xhis section will present 'some basic considerations in the^ selection of 
levlluation. instruments, ^ .review of a wide variety of- instrument, types , a . 
discussion on sources .of information about instruments , and an outline of 
ioiport^ant steps which must.be , taken if instruments rieed'to b^ developed , 
locally • - , 



3, CONSIDERATIONS .IN SELECTING ASSESSMENT INSTRUMENTS 

Identifying the appropriate evaluation instruments for measuring pupil attain- 
ment of a program objective >is one of the prime tasks, involved in the prepara- 
tion of a useful evaluation plan. It is als<. one of the' most difficult. The., 
chief criterion fo^ selecting appropriate instruments is whether or net they 
c^'n adequately measure the outcomes specified by the performance objectives. 



ERIC 



Important questions- to ' consider in identifying assessment instruments-^are. . as 
follows: . " 

•■ • •» . \ ■ ^" * * ■ • * . 

Does the, instrument measure what it is supposed to measure? This; question 

refers to trie validity of the assessment instrument. Three kinds'" of validity 

are important to consider. First is ,pont.ent validity, which assesses wheth.er . 

the test -measures t^he, content of the program being evaluated. Second is •.. 

concurrent validity which compares the test scores with other similar md^asures ^ 

Third is predictive validity, which tells how well the score can be used to- 

•predict future performance. A f ourth 'aad mgre diff icult kind of validity is ■■ 

construct .validity, which refers to. the psychological processes revealed by ' 

the pupil's behavior during the. t^t. For example, comprehension skills 

measured on certain reading ^tests are thought to evaluate a. child' s kbili ty to 

make inferences/ Evidence should be of fered ^^y the " test publisher that 

questioris on the test actuaV^y do meas^irfe- this 'ability. ' 

• If the instrument is -administered mo^re^than ol^ce to similar groups^ or 
' the same group, wi ll it yield consistent res.ults? This. ^quest ion refers to the 
reliability of the assessment instrument. When Qhoo^sing a test, the use r will 
want It to be a reliable measure of how much a pupil'*knpws and how well he is 
able to apply his. skills. The test results should be earned and under no • 
circumstances' arrived at by l^jck, guessing, or other chance factors. The test 
should be constructed so that ^oae has confidence that the scor-e the pupil 
receives will be similar to the scorc-he would receive if -the test were • * *• 
.administered to the . same person again. - . 

Is the instrument appropriate for use on the population to be:assessed? \ 
This question refers to the following: ■' ' ■ 

' ' : -/^ . ;• 

- Grade-level appropriateness ' \ . \, ^ 

- Ethnic appropriateness ' . " 

- Compatibility of norms 'to the groups * 

- Appropriateness of- instructional contf^nt - - 

-Does the instrument yield objective data? If it does not , how will you .. 
control for observed differences among those coiiectin^ the data? . : '.♦ 



D-13 



; Is the instrument easy to ^Hminl^t^r and score? For examp^ inter, -ws 
'using structured guides are generally difficult cOvadminister..^^to score, 

although sometimes they may be needed measures. 

What time "and resources are required t o administer and score the instrument? 

AS an examp-le', individually administered instruments require rmore time and 

resources than instruments given to groups. 

H^,. ^^o^■..r4-^.>. i .e. the ad ministratio n of the ■ instrument to classroom 

" \ - " ' ■ ' ' 

learninR activities / ' ' > 

Will the instrument- prbvide data wh i ch are useful for decision making at _. 

both the classroom level and the school and district level? ' 



Is the c ost of purchasing the instrumen t -reasonable andVwithln -the 
allocated budget? 

Each of these questions should be carefully reviewed during the process 
of selecting.. appropriate'assessment instruments. 

When the prp&ram-evaluator considers what instruments might best measure 
'a program's'obJectives,he or she needs to know the meaning of standardization 
-and the importance of reliability and validity. The .following discussion 
provides a brief revi'ew of these concepts. 



Standardization 

Stnndardiiiation itrJplies different things to different people. For the pur.- 
pos^G of this discussion, if an evaluation instrument has the following 
ctiaracteristics, it wiLl be considered standardized: 



ERIC 



General Characteristics of 
Standardized Instruments • „ 

Items are sys tomatically structured, 
JJpecif ic directions are given on how 
to administer the iiistrument. 
in^fLnite i.ns tract ions explain how to 
deal with the inCormatlon secured, 
^.^idence is available on validity and 
' rcilLabi 1 Ity. 

1 ' i 



Mention of norms has been omittdd from the list so that the broader deiiLnition 

of standardized instruments might apply to some criterion-referenced tests, 
questio'nnaires , a^nd observation records. 

Reliability and Validity * ' 

Reliability and validity are two important characteristics of evaluation^ 
• instruments. The reliability of a measure indicates the extent to Vhich it is 
consistent in measuring whatever-' it is meant to measure. Suppose, for example 
that a rifle placed in^a vise vere fired several times at the bull's-eye shown 
below, and thai: the bullet holes formed a tight, cluster, as* shown. In this 
c^se, the setting oi: the rifle wbuld be reliable in that the bullets hit the 
same area of the target each time the rifle was fired. The validity of a 
measure, however, indicates the extent to which an instrument measures what it 
is designed to measure. In this rase, the setting would not bo valid becauF- 
none of the bullets hit the desired target (the bull's-eye). Now suppose that 
all the bullets were spread ail over the target/, as shown on page D-15. Even 




though the bullets from the vise-held rifle hit the center of the target 
'several Limes, there were some stray shots, which indicates a degree of 
inconsistency in the way in which the rifle performed. Thus, the setting of' 
the rifle in this case was neither reliable nor valid. The desired result 
would be as shown at the bottom. 




NEITHER RELIABLE NOR VALID 




RELIABLE AND VALID. 



D-16 



The. instrument that will, be most useful in program evaluation will 
measure in a consistent way (reliability) what it was intended Jto measure 
(validity). 

Validities of standardized tej^ts often are .expressed as validity coeffi- 
cients, numbers that express a degree of relationship, generally between sets 
of scores from two different measurements • However , the type of validity that 
is most important to program evaluation is content validity-. Content validity 
is arrived at Judgment'ally , by comparing each item in an instrument to the 
objectives of the program. The key question to ask is: Does this item measure 
an outcome the program sought to accomplish? , * 

It is helpful when you do attempt to interpret reliability and validity 
coefficients- to have some guidelines as to whaL is acceptable, even though 
there ace no hard and fast rules. In general, reliability coefficients can be 
expected to be higher than validity coefficients, primarily because of the 
fact that reliability is determined either on a single instrument or between 
parallel forms of instruments, and' validity is determined by two different 
assessments of the same content. 



Guidelines on 
Reliability and Validity Coefficients ' 



1 



Reliability 



.50-. 80 



.80-. 99 



High ' 
Questionab le 



Below .50 



Unaccep tab le 



Validity 



Above .75 



High 



.50-.75 



Acceptable 
Questionab le 



Below .50 



Learning Exercise 



LEARNING EXERCISE 9: ' RELIABILITY AND VALIDITY 

«^ . ' • — 

Directions: For each statement below about an instrument's- characr;eristi.cs, 
identify the explanatory' statement about its validity and reliability chat is 
most likely to be true. 

Explanatory Statements.:- 

A. The instrument is both yalid and reliable. 

B. The instrument is valid but not reliable. 

C. The instrument is reliable but not valid. ' " 

D. The instrument is neither valid nor reliable-. 

E. • Not enough information is provided to make one 

of the above decisions. . ** 

_j 1. In an attempt io measure overall reading achievement, you have found 

that the test you 'are using correlates highly with a widely accepted 
test of social. studies and moderately with a widely accepted test of 
reading comprehension. - 

2. A student ques t ionnai re . is administered to a grou^p of students at two 

different times, tliree we'eks apart. The two sets of scores are very 
similar, ocndent for student. In addition, the items on the question- 
naire ha\e been reviewed and accepted, as important and relevant by'' 
both faculty and student reviewers. 

3. 'Even though an instrument you have selected seems to be measuring^ 

' your ins tructlcMial objectives, you.find that the scores for any given 

student vary widely when 'he instrument is used the second time: An 

appreciable numbrer of stu u'iits do less well the second time. You are . 

able to rul*.» out extraneous influences such as" physical environment, 

t, 

t e a C h. e r pe r C o rm a n <: e • e I: c . 

^ A. This arithmetic tosL you are reviewing for possible use is found to 

'correlate v<.'ry hl;._^hly with the Standf ord-Biuet . 

.•^_5., A parent questionnaire -you are planning to use is^Judged' to have 

iterns -.rjore .•.ii)ijr<>p r la te i->r teachers than f-.j?: parents. In addition, • 
nv. '-v;..) j**'-'^ ' '"'1 ^^'-''y ii^'^^-i '..^'Sts };,ivon tv) tlio same ^;roap of parents two 
we(;ks apart, yu\i discover very Little consist<uH:y on-v/!iaL a peL'Son 
does the Second t i cie as c'j:;;pare(i to tljo, first lime,. 



Learning Exercise 9 

■\ 

/ 

/ . • 

You Irave found an instrument to use in classroom observation that 
comes to you highly recommended by a f riend of your,s in a neighbor ing^ • 
district. His main caution is that the re'sults you get may be " 
heavily dependent on j us t\ who -does the observation. However, it is : 
evident that the instrument is designed to me asxire those things you 
are more interested in observing* • . ' . 

You have discovered that an unobtrusive measure you have been using 
the last three years gives you results' which are amazingly stable. 
Howeyer, the new goals established for Xhe district make you think, 
that this measure may no longer be appropriate. 



■• •. ^ ■ ■ ■ • ■• ■■■ . ■ ' . ■ . 

, ■. Learni^ig Exercise 9 

• . • • ■ . .D-19 

i\NSWERS 

Because of the higfily judgmental nature of some, of the issues underlying some 
of the situations described, four research scientists at the American Institutes 
for Research were asked to key. these items. Compare your answers with " 
theirs. 

1. No consensus.. Both A and^C can be defended. If a test correlates . 
• highiy with sojne acceptable test seemingly unrelated to it, it must 

be reliable/ Hence C would be_ an appropriate answer, Howev.er, in 
. this case, the social studies test could be highly loaded verbally 
and could be testing reading as, much as social studies. ^ Therefore, 
the test could be both' reliable and valid. 

2, A. Consensus , . ' ' 

3/ D. Consensus If you selected B, ^ remember you ^cannot have validity 
without reliability. 

4. ^ No consensus. Three of our. research scientists kaid E. One said • 

C on the same basis that number 1 could be keyed C. If the arithmetic 
• . test correlates highly with some respected test, it must be reliable. 

5. D, Consensus . ' 

h. No consensus. Both D and E can be defended. If the friend's caution 
' about results depending on who does the obser'/ation means it is 
impossible to get inter-rater relia^bility , the, answer is D. If the 
caution implies the results depend on a high degree of experience and 
training in use of the instrument, it may be both reliable and valid. 
Not enough information is given to make this decision, hence E. 

7. C.- Consensus . . 



4. TYPES OF ASSESSMENT INSTRUMENTS . \ 

■#» 

The selection of asse^^sment :i ns*:ruments begins with the ^general question of 
what is to be measured: • ' . - 



What 


is Being Measured? 


• 


Achievement 


• 


Perf britia'nce 


• 


Attitudes 


• 


Interactions among persons* 




Other behaviors 



Depending upon what, is to be measured, one of four kinds of instruments will 
probably be used: 



What Kinds of Instruments Will Be Used ? 

^ ' • Achievement tests 
. * Questionnaires ' 

> 

I e Observational records 

• Logs (pupl.l/teacher/school records) 

In planning ior data analysis^ it i^. essential ihat careful atcenti'jn be paid 
to the types of items used b'n the various instruments and -the kinds of scores 
various item types yield. 



D-21 



What 


Kinds of Items Are There? ' 






* 

Open-ended 




• 


Objective 






-true/false (yes/no) 


r 




-multiple choice 






-ratings • 






-checklists 




• 


Mixed (open-ended and objective) 




What 


Kinds of Scores Are There? 




0 


Raw scores 




• 


Grade equivalents 




• 


Percentiles - ^' 




• 


Standard scores - 






Stanines 




• 


Categories 




• 


Rankings 






Rating scales 



• The sectiorts that follow will discuss each of these types of instruments and 
give examples of item typerj and scores. 

Achievement- Tests * ^ 

In the past few years, c riter ion -referenced tects liave gained in popularity 
, until Coday they provide the program evaluator with an- alternative to the 
more traditional norfn-referenced tests. Tiv:- basic differerfe between the two 
types test is ;.a ti-ir design and use. A , norm-refer enceU test is designed 

.. I 



er|c ; li.J, 



22 



\ 



to place students in rank order- or to compare • them with other students. A • 
criterion-referenced test is designed to tell what a student knows;' under- a 
stands, or can do in relation to specif ic objective's that are expected to be 
realized. 

Some advocates of criterion-referenced tests say there is little need . 
for the traditional norm-referenced test in program evaluation — that criterion- 
referenced tests are the only appropriate achievement tests to use.- However, 
the question is not one - of either/or . Rather, it is what kind of inf oririatijon 
you want. If you. want to know how students^, stand in relation to some external 
group (other schools in the district, the 'state as a whole^^ or the nation), a 
norm-referenced test should be used. If you want to know where students stand 
with respect to some standard of mastery, a criterion-referenced test would 
be appropriate. • » . 

The Gvaluator should consider the objectives of. the program carefully 

before deciding whether to use a norm-referenced test or a drifcerion- 

<- ' . - • ■ 

referenced test or both. . v 



Norm-referenced tests . Funding agencies often require comparisons of the 
results obtained by students in - the program with the general school population; 
If so, the use of standardized norm-referenced tests in a program evaluation 
is necessary. ^ * ■ 

The major disadvantage of using srch a test is that it may not j/easure 
the specific content of the instruction provided in the program in question. 
Since norm-referenced tests are 'constructed to be administered"^ to students who 
have been instructed in a wide range of curricula, the items, cannot be expected 
fully .ta reflect the content of any particular curriculum. 

When selecting a norm-referenced test, the evaluator will want to consider 

the kinds of scores. the instrument will provide. The two mcst commonly used. 

tesr scorcn are grade . equivalents and percn.ntiles . Each has its advantages 

If 

and disadvantages. Grade equivalents have been particularly -misunderstood, 
and misused, both by educators and by the public. A grade-reqaivalent score is 
the mean or median score of th'e norm group at the time the test was normed- 
For .example, suppose a test for fourth graders is normed in the fc/urth month 
of the school year, and on a 100-item test the average raw score is 50. A raw 
score of 50 is then assigned a grade equivalent of 4.4. . 



D-23 



1 ■ 



^ ivaw Score 
"^'.Grade Equivalent 




The a^ctual range of raw scores may extend from 10 to 92 and the range of 
grade-^uivalent scores assigned to different raw scores from 2.5 to 6.3. By 
definiti\n, half the group is below average. It may be unrealistic to try to 
bring ever^^ne up to norm unless you truly believe your lowest achieving 
. pupils shoul\tl3e as good as the national average. 

' \ . . ■ 

Criterion-referenced tests . These tests have become increasingly popular in 

the last six to c\ght years becai^e they . provide a meaningful way to measure, 

acjiievement of locally set ^objectives. With a criterion-referenced test, an 

overal^l score -is generally not obtained. Rather, a small number of test itemii 

is used to determine- whether an objective has or has not been met. 

There are two differjsnt kinds yf pHr.Vonnance criteria. The first, 

Classroom Mastery Criterio n, specif iefj *.be percentage of student • .n a 

classroom who are expc-.tcd to ma-^Ler an object /.ve» 



Classroom Mastery Criterion 



. 70 percent of the studentc 
will be able to identify 
all the letters of th e 
alphabet '* 



EKLC 



1 1 V 



D-24 



Theseoond kind of performance criteria 'is Student Mastery CrlLerion , 
which refers to number of items in a criterion-referenced test that a student 
should'be able to respond to* correctly in order to show that the student has • 
mastered the objective. : : ^ 



Student Mastery Criterion 

To show mastery, the 
student should respond 
correctly to 60- percent of 
the items designed to 
measure. a Rivea objective . 



Thus, scoring criterion-referenced tests gives percentages that relat-« " 
either to 'the group of students who achieve at a given level or to the 
gf^pup of iteiij^ responded to correctly. ^ 

Criterion'-ref erenced tests, custom-made to sichool or district objectives, 
are becoming, increasingly available. Five sources are listed on page 23. In 
addition, the EP.IC Clearinghouse on Tests, Measurement, and Evaluation, has 
publ ,shed a'rt.port that cites andjescribes 21 criter xon-ref erenced-^tests. * 
Commercial publishers of the more traditional norm-referenced tests are now 
taking steps to meet growing deni.^n<f.s for crrteri'on-raferencc^d tests. 



i 



ED 099 42 7. Knapp, J. A collection oi. cri te rion-referenced |:ests. TM 
^ . Report No^ 31, 1.97A, 



Definition 



Types of Test Scores 

Advantages 



Disadvantages 



Number of right answers obtained 
by an' individual ' \ 



A ^core' derived from a ral/ score ^ ' 
that expresses grade level^ as an 
average (e;g,, 8,2 is, the kchieve- 
inent level expected of the average 
student in the second inonth\of the 
eighth grildq, ' 



.The'scorjB below which a^given 
percent of the oases lle*^ 



Easily, obtained by counting 
right answersj ^appropriate 
► for lise with inferential 
statistical tests 



A scaled score based on the 
mean and standard deviation 
•which define the distribu- 
' tion of Scores 



Score I of Scores 



!• 


L 


■J 


2' 


1 ' 




3 • 


12 


lA scaled, score 


k 


17 


. with a mean of 


5 


20 


and 'x standard 


6 


. 17 


deviation of 2 


J 


'12 






7 






k 





Reasonably sound "inherent; 
meaning in lower grades • 
Uses familiar units. 



Widely used and easily under 

stood; Probably best all-around 
type of score especially when' 
usad with percentile bands that 
account for the probable error 
of measurement. 



Has equal units through entire 
range of values. Has normal 
distribution 'by desing. Appro- 
priate for use with inferential 
statistical test. 



Gives ifjximufi intonnaiicn for a 
9-unit scale. Reasonably easy 
to understand. Minimizes non- 
significant differences as do 
percentile bands. 



kust'be changed to some type of 
derived gcore in order to make, * 
comparisons with a norm group 



Easily confused with standards. 
By definition, half the group it 
was developed. on are above the 
average and half are below average. 
Difficult to compare results of 
different tests. Not meaningful at 
upper grade levels. ' 



Units along scale not equal in 
size. Differences near median are 
over-en?)hasized. Raw score 
differences between 90th and 99th 
perqentile are' much greater than 
raw score differences between 50th 
and 59th pei^centile. " ' . • 



I ■ 

Not comitonly used in local school- 
settings except in large-scale 
national testing programs, such* as 
those pruvi'ded by Eaucatjional 
Testing SeiWce and the American 
College Testing Frogram. Difficult 
for mosf people to understand.' 



A r,ingh unit of bange is very j' 
large and so will not reflect ; 
small difference^ In^ achievement! 
Sv'; widely used. | ^ 



Vliere to Find ' Ex^ sciiiv^ 'Criterion-Referenced Instruments 



Where . 

Instructional Objectives 

Exchange (lOX) . 
Box 24095 

Los Angeles, CA 90024 
SCORE 

Westinghouse Learning Corp 
P.O. Box 30 * \ ■■ " . 

Iowa City, I A 52 24U 

Comprehenive Achievement 

'>foni"toring' (CAJ4) 
Sequoia Union High School 

District 
480 James Avenue 
Redwood City,' CA, 94063 

National Assessment of 

Educational Progress (gAEP ) 
300 Lincoln Tawer^ ' '"^ ' 
1860 Lincoln Avenue 
Denver, CO 80203 



ORBIT " ' 

CTB/McGraw-Hill 

Del Mbtite Research Park 

Monterey, CA 93940 



What 



Re ading , Language^. 
Mathematics , Social 
Studies\ (K-12) 

Rea'ding/Language Arts, 
Mathematics, Science, 
Social Studies (K-8) 

Mathematics , Sr. ience. 
Geography, Business , 
Homemaking , Language 
Ajrts, Literature Com- 
prehension, Foreign 
Languages* (8- or 9-12) 

Art, Career and Occupa- 
tional Development, 
Citizenship , Literature, 
Ma thematics , Music , 
Reading, Science, SociaJ 
Studies, Writing (ages 9, 
13, 17, adult) 

Mathematics, Reading, 
and Communicacions 
Skills (K-12) . 



Both NAL'P aiKi. CM are publically supported ^^rojects, and materials are 
either free or relatively inexpensive. 

the types of items'usiid with most och ieveinont tests are objeeiive — '.hat 
is,; they are multiple choi-.e, .truu/f-.lse, cv matching and can be scored by 
machine. 



Questionnaires ' 

Evalaators frequently uso a quest ion- aire to assess opinions or attitudes qf 
participants in a prograu. and/or those who are i,n some ocher way , associated 
with a pi-ogram. . • 

^-mile there niay be appropriate standardized questionnaires, most evalua- 
tors either develop or adapt items trom existing', nonstandard ized ins truftients 
^uT^ocding tc their' appropriateness' for ' measuring a given program objective. 

The deveTopment'of even ver^ simple quf:stionnaireb is a more exacting and 
demanding task than is sometimes realized. -^Every instrument so developed must 
include review and field-test steps in order to avoid ambiguous questions 
which .may yield meaningless or invalid information. To develop good qiiestion- 
naires takes Jia lent , -time, patience, and money. Tor this reason, a thorough 
search should first be made to see if good instruments exist that will fit 
prograifi-evaluation needs. ■ ■ - , 

Questionnaires may be. administered much like achievement tests; they .may 
be mailed to individuals sucjh as parents; or used by the program evaluator in 
a struc'tured interview with a group. The need for '-individual structured 
interviews is dictated ^by the circumstances and may be filled by volunteers 
fcom the community. Mailedl questionnaires are cspecially'sub ject to bias. 
The people w!io typically fijll ouf' and return a mailed questionnaire may be 
- very unlike the-population an general. One way to compensate for, or detect, 
'this -bias is to follow up with telephone -or door-to-door personal interviews 
with a sample of nonrespondents, using the questionnaire as an interview 
guide. The interview fchnique may also be called.for if the target population 
is "very young, of questionable literacy, uskilled in the use of English, or if 
th£\^ questions ■ are ^•ery complex. • ■• 



ERIC 



I5j 



Major Uses of the Ijiterview Technique ' 



To detect bias . 

With young children 

With bilingual populations 

With low socioeconomic groups 

With complex questions 



If an iiTCerview is deemed best, be certain you^ Interviewers are trained 
and are able to ask the question in a neutral manner without , leading the '* 
person being interviewed. They shp.uld be able to recognize a vague or ambiguous 
response and should probe in some neutral manner such as, "Tell me more about 
it, " or -"What do you inean?j'' until a clear response is obtained. — — 



Guidelines in the Review arid Selection of Questionnaires * 'L-here are a number 
of thiqgs to consider in the selection (or development) of a questionnaire: 

1. Are the ques^ici^s .isking only for needed information? There is a . • 
tenderxy among .oi:c persons who develop questionnaires to include 
nonessential izems Just because they are interesting or because he or ? 
she had always wondered [about such details. Avoid trivia. 

2. Are the words simple, direct, and apt to be familiar to all respon- 
dents? Education, like other prof'essionX, has a technical, sometimes - 
mystical ,j argon. If in douDt, ask one or two noneducators to read 
the items for underscandability . 

' 3. Are the questions clear and specific? Items that are too general, 

coriplex, or otherwise ambiguous will not get the iriform^ion desired. 
Words such as often , occcasionally , usually , mciny , any , mucKmean 
* different things to .afferent people. If used, they should be 
defined. 

4. Are any items^ double-barreled? For example, the question *^o you 

plan to leave school and look for a job next year?" is addressing two 
issues. Each question should contain just one topic. 



Are the questions loaded or leading? r"Why do you think instruc- 
tional method A is so successful?*' assumes everyone agrees that the 
method -J^ <^accessf ul. ) 

,Do the questions apply to all respondents? A question directed to 
taxpayers cf the community that asks, '*Do you and your wife have 
school-aged children?" is based on too many assumptions., 

Will Che respondents' answers be influenced by response styles? A 
response style is. a ter.dency to choose a certain response category 
regardless of item conCent. Examples of well-recognized response styles 
are: . • 

« Acquiescence ^ 

Given a choice between "agree" or "disagree," a disproportionate 
number of "agr^'e" responses will probably be obtained.- Instead of 
"Do you agree with the new school policy on flexible . scheduling?'^ 
ask: 

-. ■■ • ■ •'^ ' 

"The pew school policy on scheduling as compared with the ^ 

previous policy is . v 

an improvement 

not as good ■ ^ . * , 

about ^.he same 

don' t: know" ' .; ■ 



Social Desirability ., 

Some people tend to choose answers that they think everyone else 
will choose rather than those that express their own opinions. So 
avoid using questions that have a strong social preference for 
agreement or disagreement. 



9 

\ 



, • O rdinal or Position Bias 

.i . . 

If they are giveit a 5-point scale such as 



* . very good fair poor very 

good ' poor 

most persons will tend to avoid the extremes. This can be prevented 
to some degree by defining the scale points in specific terms. | 
For example, on a leadership scale, instead of "very good," use 
"exceptional I'Dader; able Co take over and pull things intoL,shape; 
people enjoy going along with. hi;n/her; respected\by subordinates." 
"Very -poor" might be "completely . lacking; definitely a follower; . 
does not try to convince others what is best." 



Item Types in Questionnaires . Questionnaires; and interview instruments 
usually are structured to include a < combination of two major classes of 
items: .open-ended and objective. The open-ended item offers the respondent 
an opportunity to give his or her own answer. The objective item forces the 
respondent to make a choice between two or more alternatives. 

Open-ended Ite ms. 

During the second year of the Evaluation .improvement Project, a follow-up 
study was done with a sample of first-year workshop participants. Questions 

were designed to f ind ^,out if the workshops really caused participants to 

\ 

behave any differently in their approaches to program evaluation. One of 
the open-ended items asked, "Are you doing anything differently .in ' olation 
to program evaluation this year than last year, at^tributable to /o . 
participation in an EIP workshop?" Two hundred four usable statements were 
made In response to this question. Examples of response are shown below: 

• Requiring evalution process be. established prior to introducing new 

prog ran 

..... . , / 



•"Broader approach; increased awareness of need; improved data-collec- 
tion methods 

• Providing more inservice for staff and aides related to objectives and 
utilizing test results as comparative data related to those objectives 

• When planning and writing projects , more care is taken to plan for 
evaluation from the beginning of the project. ' \, 

• I aw building evaluation^ into the thinking^through of all department 
- projects* ' 

• Better process evalua^tion procedures and techniques — t/ied to build 
in the evaluation design rathev than superimpose it» 

. Spending more time selecting testing instruments to assure valid 
conclusions in evaluation v 

• • Involving more people, rather than trying to haiitile every^thing 

needed to be done in any program evaluation. 

/ • • ■ . 

• Better preparing of objectives; better choice of ins truments ; better 
overall picture of evaluation ^ " . 

• Working more with other staff members on follow up utilizing, test . 
results 

• Better job of evaluation of. programs; befter job of communicating ' 
with parents regarding evaluation; bettier participation^, of 

my staff in planning . ... 
How do you reduce 204 such statements to a meaningful summary of data? 
The task is largely a matter of applying judgment and perseverance. One way 
of proceeding would consist of the following steps: 



ERLC 



List all responses to the quection on as laany pages as necessary (in 
^this cases it took six and ojae-half pages, to record all. the information) 

Read over 30-40 responses to get the flavor of what is being said. 
Are some people saying the same thing but using slightly different 
words? What are the key ideas that are, being stated'' Do caJ:egories 
begin to form in your r.ind? 

Try listdbng key categories. In the sample statements, the following * 

■ .J ■ ' ' • ■< 

Here among the key categories: 

• More. skilled with evaluation procedure 

• Better data collection and analysis procedures 

^ ' * . . . ■ 

• More involved with evaluation ' 

■r . ■ . \ 

• Working more with staff • s^v 

• Better, organized . 

• More effective reporting ^ 

• Better selection of test instruments ' 

• Greater awareness of need- ' . 

Go back to the beginning of the list and try to classify each state-, 
ment under one of these categories. . , . 

If you find a statement that does not fit any :otagory> create a new. 
one. . ' 

When you finish the classlf ication^ check the categories. Are two or 
more categories near enough in meaning and intent that they may be 
combined? Are there several categories with just' one or two responses 
for each? Should they be combined into one miscellaneous category? 
Categorization into mo;:c* than 12 or so separate categories probably 
results in distinctions that are too fine, n . . ' • 



D-33- 



The final results on the EIP survey were presented in this jrlpnner 



Changes in Participants' Evaluation Activities 
Attributable to Attendance at EIP Work?*.hop. 



Number 

More Skilled with 

Evaluation Procedure ■ 64 

Better Data Collection 

and Analysis , ,25 

More Involved with 

Evaluation 23 . 

Working More With Staff ^21 

More Organized, 16 

5.. cter Reporting 12 

Ch a ng ed / Ad j u s te d 
Evaluation Design 
and Ongoing Project 11 

Better Selection of 

Tes.t— Ins truments • 10 

Greater Awareness pf • 

Need .10 

Not. Applicable . 6 

More Aggressive ■ 4 

Used Sampling Technique 2 

Total Number of Statements 

Categorized. - 204 



Total Number of 
Respondents 



199 



Percent 
31 
12 

11 
10 
10 
6 



.5 ... 
3 
2 
X 



Note that 204 statements were categorized from/199 persons 'who responded. 
Most persons gave a r.'ispon>ie tha't fit into just one category; a few gave 
responses that f it i ^to more than one» 




If there is a wide differ,ence between the^mimber of statements categorized 
and the number of persons' responding, the evaluator may wish to look back to 
find out if a few persons are being so verbal a^ to bias "the results. 

This^ type of data collection yields category data, sometimes called 
content analysis. The data coiicccion saction of this Gui de suggests statist- 
ical techniques v o use la treat: L.ig category data. 



/^^dvantcires and PisadvantaRes of Open-Ended Item s 



Advantages 

1. Provide freedom and 
spontaneity in response 

2. Respondents usually like 
being asked for. their 

, opinions; good warm-up 

3.. Useful for determining • 

range of responses which is 
not possible with objective 
items . ^ 

4. Testimonials lend color, to 
the research report 



Disadvantages 

1. Difficult and time-consuming 
to score 



2". 



3. 



4. 



Many open-ended items make ^ 
an instrument too time-*' 
consuming for respondent 

Responses may be related 
to general verbal facility 
of respondent . 

Testimonials,^ if not balance'd 
by more ob iectiyj^^ evidence, 
reSiiltv tjTi:>evalu4tions that has 
little substance 



Objective Items 



The objective iteji provides the respondent with a struc::ured . response. There^ 
are several types of . structures : checklists, multiple-choice items^ rating 
scales, and rankings. 

In the checklist , the respondent is given a list of items and asked to 
check all that apply. For example, .in a follow-up study of Elt workshop^ 
participants, one item 'dealt with whether or not the participant had taken 
steps to get others to improve skills in program evaluation. Those who said 
they had were then asked to check which of the following actions they had, 
taken: 



Encouraged staff to attend EIP workshop 
Conducted evaluation workshop locally 



Circuiited EIP material for review axid study 

CirculaWd . other iceteriais related to progtam evaluation 

Talked Informally with staff about problems • related to program 

CtValuati^qn ' 

Helped colleagues with program evaluation problems 

7 . 

Other " \ — '. - 



specify. 



Items that require a "yes" or "no" response are like a checklist ip that 
both create category data and would be analyzed in similar ways. Here are 
two examples: 



, Item. from a Reading* 
. Lab. Questionnaire 

1. Do you feel you 
have developed >■ 
r better reading 
habits due to 
this course? 



item "f rom A" Seir Cb'ncep^t 
Questionnaire 



Yes 


No 









1. I feel left out of 

♦ 

' things is class. 



Yes, 
^Like Me 


No, 
Not Like Me 







A miltiple-choice item requires Che respondeuc ^:o. tfiake a judgment based 
on a specific set of alternatives. 

Example I: Teacher Judgment ^ 

Which one of the changes listed below did you find most helpful 
in implementing the new reading program? 

Improved selection of curriculum materials 

Inservlce training workshops 

Increase in number of teacher aides , 

' . Grouping of students ' 



EKLC 



-36 



Ex.aiiiple 2: 



Example 3: 



Quantity and Intensity Scale, 

, * V 

For what«portion of your activities as a^program evaluator do 
you receive clear and specific directians^f rom your supervisor? 

, For almost all activitieis \ : 

For most of my activities \ - 

^ For about half - \ 

For; few of my activities ' \ 

- ' • . \ • 
For almost none of my activities \ 

~ . - \'- . 

Amotint of Time Scale 

When you are v^orking,' what is the average day like f or you? 
How often dues time seem to drag?, y 

About half the day or more " \. 

Apout one-third of the "day - * ' ; - 
About one-fourth of the day \ . 

About one-eighth ■'of ^ th a day ' V 
Time never seems to '^"ag, A 



The advantages of the multiple-choice type of item are primarily in their ease 
of administration, scoring, and analysis. The greatest problems relate to \ 
theix' car^iil development, avoidance of ambiguities, and reasonableness of» \ 
choices, those points are discussed more fully at the end, of -this section. 

Rating scales assign numerical values to the vatious responses" to an 
item in order to spread them. That is, a rating scale gives- the rater the 
opportunity to present his or. her opinion on a continuum of^ judgrafent.- Most 
rating scales permit the rater a choice of* three to five Values. Fcr example: 



Item From a Teacher Questionnaire 

1. jr.What is your overall reaction to the effectiveness of individualized 
instruction? <> 



Extremely 
Pleased 


Somewhat 
' ^Pleased 


. Neither 
Pleased 
/Nor 
• Displeased 


Somewhat 
-Displeased 


Extremely 
Displeased 


1 


2 


/• 3 • ■ 


4 


5 




, / 

/ 

/ 









■I ' 



D-3? 



'Choice of an odd number rating scale allows, the respondents to adopt a 

neutral position. ( An ■ even number of choices would force him or her to t^Ke a 

positioni) Before deciding on the number of scale points, decide whe^t^er or 

i ' ■ ' 

not you want respondents to take a position, * ^ 

" V ■ " • . ; I . . 

\ The selection ot descriptors for ^ach rating on the scale is moat important, 
• Insofar as possible/ they should mein the same thing to , all .expected to 
respond. . In the above example, you! might instruct the resp6ndent as, follows: 



Degi:ript>or 



Extremely pleased 



Somewhat pleased 



Neither pleased nor 
displeased. 

Somewhat displeased 



EjJfremely displeased 



Means, in Comparison to All T.echniques 
Yoi!i Have Used ] [ 



Among the top 10 percent of techniques you 
ha^^e used 

Better than most but not among the top 10 

percent . — ' 

I- , 

1 ■ . ■ V 

Abdut average in comparison to other techni- 

I ■ • 
ques 

Be low\ average, but not among the worst 10 

percent ' - * 

I ■ ' ■ ' ' 

Among the worst 10 percent of techniques you 

have used 



. 'care Wt be taken with -rating U precisely what is wanted: , 

Ttte following will illustrate the po^nt: ^ - 

" Another type of rating scale commonly used to measure attitudes consists, 
of a series of statements, each of wllich has it s..own_?cale value,. Typically, 
the statements are arranged in order from highly positive to highly negative. 
The person whose attitude is being measured is simply asked to check those 
' statments with which he or she agrees. The scote is; obtained by adding the 
values assigned to the ktat^ents che'cked.' An example of this type of scale 
is given on the following page. 

In the example that shows the tallying and scoring, on page D-39 , the median 
~ score of A. 02 for Ihe group falls at item 6, "it solved soite problems for me." 
This would ordinarily represent the tendency for the group. However, there was a 



ERIC 



1 0.^ 



Directions: 



^Kropp-Verner Attitude Scale for ^ 
Measuring. Effectiveness of Meetings 

Check (>/),below only -.those 'statements which accurately reflect 
you^^ personal reaction to the Evaluation Improvement Program 
workishop." ♦ 



Check Here 



1. It wa*s one of the •>nost rewarding experiences' 
. I have ever had» 

2. E:cactly what I* wanted. 

3. I hope we can have ainother one in the near 
future. 



4. 



5. 



JLP_„P.?OY.i4_^4 the kind_pf experience l. can apply . 
to ray own si^^tion. 

It helped me personally ♦ - - — 



J 6. 
9. 



^10. 
_11. 
12.. 



It solved some p*roblem3 for me. 
1 think it served its "purpose.^ 
It had some merits. 
It was fair. 

It was neither very good nor very poor. 

I -^was. mildly disappointed. 

It was^ not exactly, what I needed. ^ ' ^ 



^13.- 

14. 

' .■ 15. 

16. 
,\ 17. 

18. 

19.^ 

20. 



It was too general. ; . 
I am, not taking any new ideas away. 
It didn't ho'lii my interest. 
It was much too superficial. 
I left dissatisfied*. 
It wiis very poorly planned. . 
-I didn't learn a thing. 
It was a coiiq>lete waste of time. 



To Be : 
Ccropleted 
by Trainer 



Score 



Participant's 
Median Sco^e 



•Kri>pp, R.P. , and Verner, C* An attitude scale techniique for evaluating 
meetings. Adult Edufcatioo . VII(4), Summer' 1957 



Scoring of Kropp-Verner Scale ' 

Tally the items checked by participants. in column (2) of the form 
below. In coluTinn (3) ^write the total number of tallies Obt^n 
totals for column (3) knd find the Mdian score. 



(1) 




(2) 


(3) 


(4) • 






Checked by 


Checked by 








Participant , 


Participant 






[Tally Xi.e. , 


[Total of 


V 


Item 




)] 


Tallies] 


MD .' 


1 




nte^X^^ 


• 6 . 


1.13 


2 








6 


1.58 


3 






22"^:-^- 


2.25 






4.4 


2.77 


5 








* " 3.. 40 


6 








41 


4i02 


7 






// 




4.44 


6 ■ 






20 ■ 




9 






■'^yy -. 


5.30 


10' 






■ i 


^■V:.M-'::,^-:,l- - 


6.02 


11 




// 






, • ' 6.78 


, 12 




/ 




^^^.-^^.:;:.■■^^■l 


■6.97 


Jr3^ 




// 






. , . 7.19 












. " 7.45' 


15 




.«>^ 






■" ' 8.19 


16 










8.62 


' 1^7 










9.29 


18 










9.69 


19 








'/ 


10.26 


.20 










10.89 








Total 


217 


T 








Median^ 


A.02'- 





larger frequency at item^A: "It provided the kind of experience I can apply 
to my own situation/*' Therefore, it might be better to report that a consider^ 
able number of persons responded that, "It provided the kind" of experience 
that I can apply to my own situation," and that "It solved some problems for 
me . 

Statements 5 and 7 were checked a num^ft: of times and these might be 
mentioned as well. Extreme scores, too, are interesting. There were six 
persons who said, "It was one of the most rewarding experiences I ;have ever had 
(item 1). .Six persons also said, "Exactly what I wanted" (item 2). But note 
that we. do not know whether the same six checked statements 1 and 2. At the . 
. r.her extreme, only five checks reflected attitudes on the negative side of 
n'iutral, and this does not necessarily represent five different persons. 

tJnder some circumstances, you may wish to'ask respondents to arrange 
rankings, a series of options in rank order according to personal preference. 
When the number of things to be ranked is small and homogeneous, the ranking 
may force persons to make discriminations they would not otherwise' ma''.e. For 
example^ this section of the Guide contain^ a number of key concepts ow 
selecting and developing evaluation instruments. An appropriate posttest' 
might seek to find out which topics are the most helpful to eyaluators. 
Consider two methods' for trying to collect these judgments. The first is a 
ranking procedure; the second is a rating procedure. 

Which among'.the following topics, did you find most helpful personally? ' 

Method 1; Ranking Procedure ^ 

Directions: Rank order by assigning a "1" to that topic which was most 
helpful, a "2" to tljat section you found next most useful, 
e*tc. 

1. Veliability and validity 

' 2 . achievement tests 

3, questionnaires 

A, observational techniques 

_^ 5. other behavior y 

, 6. sources of informa'tion about instruments 

7. developing assessment instruments 



D-41 



Method 2; Rating Procedure 

Directions: Rate on a 5-point scale each of the following major topics 
discussed (n this section,, using the following scal^: 



1 - 

2 = 

3 = 

4 = 

5 = 



,Gf no use: I will never need to know or use this. 

Of minimal use: I may. have to use this information some , 

time. V / * 

Of 'some potantial use: , If I have to make use of this informa- 
tion, this topic will!;>e helpful. • ^ 
Of considerable use: I expect, ! will need to use_this /' 
information. 

Of maximum use: I will surely, have to make use of this 



information. 



■ 

• 






■ . 


1 

no use 


^ 

2 

min* use 


3 

some use 


much use 


5 

r . use 


1. Reliability and ' Validity 








1 

1 ■ 

— ^ 


2. Achievement TesL3' 




i 

1 

' . 






1 

I 
t 
1 




3. Questionnaires 






L, 








4. Observational Techniques 








/ 


5p Other Behavior ' 








1 

--^ . ■ 




• 6. Sources of Information 
about Instruments ; 




\ 

\ 




/ 


* 


/ 

7,' Development Assessment 
instruments 


! 


- A 

\ 

\ 
\ 






1 — ^ — 







ERIC 



Dl'c£erai.t. kinds of information are asked fc\r in the two methods. The 
first method asks hoW cacli topic st.nds in relation to the other topics 
(norm-referenced approaclOj. The second method as^s how valuable each topic is 
in terms of its useful. .ess; (crU.erlon-r.ef erenced ^pproach).. 

Actually, the two approaches could be combine^, an^ both kinds- of infonnn- 
tion obtained. The point Is, the program evaluator^^ mus C anticipate what 
kind of results are wanted'by k.owl.g beforehand-how those results will be 
ust»d. / \ 




Summary of Questioimalre Item Structure 



Item Type 


Advantages 


Disadvantages 


Open-Ended . 


^ Free responses 

- Reasons can be given 


- Difficult and time 
cons umi ng to r es p pnd 
to, to score, and 
to interpret 


Checklist 


- Simple options 

- Easy to interpret 


Limited response 

\j iij.^ ea enc / aos en t 

or yes/no respoijses T 


Multiple 
Choice 


- Provides closure on 
questions 

- Simple options 


- Limited response ^ 

- Only correct or 
incorrect responses 

- Limited information 
on reason for 
judgment expressd 


Ratings 


- Degree of judgment 
identified; values 
assigned 


- Directionality 
(-/+) confusing 
. - No information on 
reason for judgment 


Rankings 


- Provide a norm- 
referenced approach 

- Easy to develop and 
tise 


- Can only be used with 
limited number of 
homogeneous topics 



4 



Learning Exercise 10 



'LEARNING EXERCISE 10: JUDGING ITEMS 



"At the end of the eighth mon-.h of the .school year. 70 percent of the partici^ 
pating students and parents will judge their reading program to have been 
successfully implemented, as measured by a questionnaire." 

The evaluator accepts as valid this measure of students' and parents' attitudes 
■about how much has been learned and whether enjoyment ^of reading and independent 
reading practice have increa^'^^d. i ^ 

The questions on pages 44 - 46 represent a preliminary list of those that may. 
be used on the questionnaire. Your task is to critize/ each question and 
decide which ones should be included, revised, or discarded and to give the 



reasons . 



In criticizing each item, consider the following three important criteria: 

1. A ppropriateness or Validity of the Item 

Is the item assessing something (knowledge, behaviors, attitudes, etc.) ^ 
whicH indicates whether or not the evaluation objective has b^en reached? 

2. Clarity of the Item ' 

Is the item written in' a way that everyone will interpret it in more or 
less the same way? Is the item misleading or ambiguous? Does it present 
just one concept at a time? . 

3. Accuracy of the Response 

j Do you think that the person answering the item will give an honest 

response or an accurate response? Is it emotionally loaded? Does it tend 
to bias or lead the respondent? . ■ > ^ 

study each question in relation to the above criteria. Decide whether you 
would accept it as is or revise or discard it. If you decide that any of the 
items should be. revised or discarded, enter- an "R" or a "D'-in the appropriate 
columns on pages 44 -46 and then state your reasons. When you have completed 
all 12 questions, go back and try to revise' those items you decided should be 
revised, using the space on page D-46. 



1 U r\ 



Student and Parent Questionnaire on Reading Program 
Critique of Items 



(If you decide to accept an item, check the "Accept" column, If you 
decide to revise or discard, enter "R" or^T and write reasons in 
the next column. Af.ter finishing all question, enter revisions.) 



Item' 



Accept 



Revise or 
Discard 



Reasons 



1. '.Did you learn anything from 



2. Do you think the instructional 
methods of this class are better 



3. Did you like this year's program? 



4. As compared with the reading 



you feel that this ^rograra 
was better? 



5. Do you think getting individual 
ERJC p is a good way to learn? / 



this year's reading, program? 



than in other classes you have 



taken? 



program taken last y(!ar,:,do 




■ 'I 

Critique of Items , (cont'd) 

> ' \ 

(If you decide to aCcept" an item, check the "Accept" column. If you 
■ decide to. revise or discard, enter, T or"D" and write reasons in 
the next column. After finishing all question, enter revisions.) , 



Iteni 


Accept 


Revise or 
Discard 


Reasons 


1 

Revisions 


6. Were the. teaching assistants 
pleasant and helpful? 










/. Ace parents pleased with the 
. ■ ■ Dronress you hc.ve.made' in 
this year's readi-ig program?' 


7-'- 


i ■ 






^i. Would you recommend thir , 

vwnnr^m HHP uVifl llv^Pfl^ 

, to.iniprov: ^'U or her 
reading? 




[ 


■ ; , ' t 


- 


9.' If a ' I'J ".ilva' .ied class 
like t'lis '.^s formed, 
i/ould rjc war.! ) be in it? 


■ 


i 




; ■. ! 

r 


. lOi Do you * . vy reading niore 
now because of the program? 








' 1 



Student and Parent Questionnaire on Reading Pro f>rani 
Critique of Items (cont'd) 

(If. you decide tq^' accept an item, check 'the "Accept" column. If you 
decide to revise or discard,. enter "R" or "D" and write reasons in 
the next column, After finishing all questions, enter revisions.) 



. ■ Item ^ 


'Accept 


Revise or 
Discard ' 


1 

■ Reasons 


' Revisions 


11. Do 70U read more books on 
your own than you did ]^st 
year bkause of. the program? ' 

41 






t 




12., In terms of how well you 
now. read, do you think ' 
this year's program has 
something to do with that? ■ 




6 


c 


■ \ 

\ 



, . . • ' ' Learning Exerpise 10 

i * * ■ . *. . ' ' * • * ' . . , • 

' ^ ' . D-47 



ANSWERS .iU 

, .. • ' " JUDGING ITEMS. 



Item 


(2) 
% 

Accept 


V 

(3) 
Revise or 
Discard 


— 1 ; ^— : i ■ r— 

(3 

Possible Reasons 


1. 




X 


Too general: Learn "anything" could include'"^ 

"Yes, I learned to hate books." , 

■ ■# ■ ■ - 


2. 






"Instructional Methods" is jargon wh-ich may 
have'little meaning for ?th graders. 


• 3. 


X 






4. ' 




X 


Too general, vague and amtigous. May encourage 
yeS response for extraneous teasons. 

_ ^ '. Jk : — ' — '— 


■ 5 


" X ■ 






6. ' 


s --- ' 


X 


Asks for two judgments in a single item ^ 
("Pleasant and helpful"). Pleasantness- is . 
not necessarily related to the learning - 
proces s . ■ 


7 

* • 




X 


' Should be individualized to fit one student and 
his or her parents. 


8." 


X 






•9. 




X 


"More advanced" may be ambiguous; highly' 
competitive students may be more inclined to ^ 
say yes than others; so,;ne students may reason 
that ti»ey did so well in the current class 
they really don't need more. 


id. 


X • 






11. 


* X 






12. 




X 

/ 


May mean either the student thought his good 
results or his poor results could be attribute.d 
to the program. 

— « i- ' 



ERIC 



D-48 



Observational Techniques and Instruments 

While both achievement tests and ques tlonnalres can give valuable Information . 
for program evaluation, there are. many kinds of information that cannot be 
obtained from them^ Observational techn'iques and liistruments for recording 
observations provide an added dimension. Like any other assessment instrument 
there are both advantages and disadvantages in using them. ' 



Advantages and DlsadvaritaRes of Observation 



Advantages 



Can provide valid 'and reliable 
information on social-emotional- 
personal adjustment not possible 
with other traditional methods. 



Disadvantages 



1. Difficult to get valid and 
reliable data. 



Can test a person's ability to 
\apply information in life-like 
situations. 

Easily adapted to a variety of 
tbsks, -settings, and individuals 
ai^ all educational levels. 



V 



Long period of training 

arid experience may be required 

for the' observer . 

Many activiti.es take place 
simultaneously in a classroom 
and it can be difficult to 
record behaviors that are 
significant. 



4. Provides a valuable supplement 
- to ^chleyement data. 

5^ C^in provide both qualitative 
and quantitative data. 



\ 



4. Interpretation of observatlon'ai 
findings mus^ take into 
account the context, must 
not generalize from a too 
limited sampling of behaviors^ 
must not give dlsproportiona-te 
weight to» Negative incidents, 
and must be as objective as 
possible, given the data at 
hand. 



In ^progranr evaluation, observational techniques are'most helpful in 
obtaining data, on: " "C* 



. ■ , • Group participation and responsibility 

• .Individual student interaction with. the group 

• ' Teacher interaction with class 

A wide variety of instruments may be used to record observations. Rating 

scales and che : 1 i.sts are commonly used. But anecdotal records,^ 3ocU).metric ^ 

-^techniques, au.\ hi. .Ly developed systems, such as the Flanders System of 

Interaction Analysis, are' also ways of collecting observational dat^. Two 

e.xariiples follow: , . 

t ... 

Example 1: 'Obs,<irvation and Analysis of Question- 
Answer-Feedback Sequences in Classroom 
Inst ruct ion ' <^ 

Suppose one objective of a program is to improve teaching techniques that 
encourage student participation in general classroom discussions. Suppose 
further that teachers have been told that the use of praise and affirmation of 
students, correct responses is to be preferred over negative or critical 
remdrks about students' incorrect responses, and that -di recting a question to 
another <s^tudent or rephrasing to make it easier are to be preferred ov^r simply 
giving the class the answer. ^ « | 

The observation sy.stem shown on the following page could "be a*sed both 
before teachers receive instruction and afterward to determine the effective- 
ness of teacher training. The instrument for recording observation could be 
printed on. both sides of A"/x 6" cards, as \xiustrated in Figure 1. Effective 
.use of tliis instrument would require setting up a schedule for observing 
each teacher, both before and afte- they have received instruction. / Several 
observations at periodic .ntervals after instruction might be scheduled. 
Important considerations include: . * ' 

L. 'the air.ount- of ubi^ervat ion tirue sh.juld be the same for each classroo^n. 



Flanders, N.A* TL'jHMioti i nfluence, pupi. 1 attitudes, and a c hievement . Coopera- 
tive- Research Mono>;"r"iph Nu. 12, OK 250AO, Washington, D.C.: U.S. Govurnnent 
Printing Office, 19^j5\ 



1 



Figure 1. System for Recording Observations, of Teachers' Reactions* 



Coding Calegofies lor OuesiiorvAnsvyef-Feedteck Sequences 



stuoen: sex 

SyUBOL 
M 
F 



MEL A mimm 

Male" ThesiudenfJ ,f<ennglhe question tsma|e. 
FeTOle* The LtixJenI answering ihe question IS !drwie 



STUDENT RtSPONSf 
+. • Righi * 

4 Part right 



Ans 
Ask 

Ca'is 
Rep«ai 



The teacher accepts the student's response as 
correct or satislactoty 

The teacher considers the student's respor^e to 
be oniy panially correct or lo be correct but 
1 incomplete. ' ' ■ [ 

Wrong The leact^f corsiders the student's response lo 

be incorrect , ' 

No answer The student make[5no response or says he 
doesn't l(now:(co(|e student's answer here il 
teacher gives a l^pdback reaction before he is 
• abletorespofid)" 



TEH> M^EEOaACK REACTION 
t Praise 



Teacher praises Student either in words ("line." , 
"good." "wortJertul." "goodjhinking")of by 
•ixpressing vertjal adirmation in a notably warm, 
joyous, or excited manner 
Teacher simply allirms^that Ihe student's r^ 
spoftse is coifecl (nods, repeats answer, says 

•■Yes;"'OK;'.ejci, • ^ 

Teacher maKes no response whatever to stu- 
dent's response-he simply goes on to sorr^ething 
else. • ■ 

Teacher simply indicates that Ihe student's re- 
sponse is incorrect (shakes head, says "No," . 
"Thal'snotr1ght.";'Hfivmm;'etc| 
Teacher cnticizesstudenl, either in words 
("You shoukj know belter than that," "That 
doesn't make any sense-you better play close' 
attention;' etc.) or by expressing verbal nega* 
lion in a Iruslrated, angry, ordisgusted manner, 
Teacher provides the correct answei/of iHe 
student • . 

Teacher fedirects the question, asking a dilfer- 
ent student to try to answer it. 
Anothetstudent calls out ihe coriect answer , 
and the teacher acknowledges thai il is correct, 
Teacher repeals the original question, either ii 
r\. its'efttirelyorwitbapromptCW'Ooyo^j 
\ 1 know'" :y/nal's the ar^swer^'l 
Rephrase or \ Teacher make$ original question easier lor slu« ' 



Allirm 



No reaction 



Negate 



Criit:i/e 



Teacher gives 
anfiwer 
-Teacher asks 
another studeni 
Anolher student 
calls out answet 
,Rej)eats 
qfelion 



clue 



Mewquesicm 



dent (0 ansv^^r by reph rasing it or by giving s 
clue 

Teacher asks a new question (i e , . a question 
that calls \7 a dillere M answei than the original 
question called lor) 



STUDENT SEX 



STUDENT RESPONSE 



TEACHER FEEDBACK REACTION 



r 



NO, 



6 



10 



Ji 
15 



++ 



GIVES 
NANS. 



4SK 



CALLS 



flf- 
PEAT 



CLt/E 



NEW 
Q(/£S. 



/ 



*Good, T, L., and Broohv.' J.' E. Looking in classrooms . San Francisco: Harper 
and. Row, 1973, pp. 62 and 63. Reprinted with pennission' c 1975, Harper and 
Row, Publisliers, Inc. • . . ' 



D-51 



2. Set observation times that will-be best for all classrooms. Avoid 

■ periods immediately preceding or following vacations or special 
events. Early Monday morning and late Friday afternoon should be 
avoided. 

3. Where possible, assign classrooms randomly to different time blocks. 

■ ' Classes and teachers vary as the school day goes on. In classroom 

observation, you want to get a fair sampling of classroom climate 
across all classrooms. 

4. If a number of . different observers are used, be sure they are adequately 
trained in the observaton procedure and that iqter-rater reliability 
has been checked (this Is discussed in a later section) . 

Example 2: Interaction among Groups 

The next example ds an observational technique that wds used in a national _ 
survey* of 13- and 17-year-olds as a measure of art objective ' dealing with 
Che ability to apply democratic procedures on a practical level when working 
in a g^oup- demonstrates one way of measuring interactipn among students . 

and illustrates the need^ for very explicit directons in- the training of 
observers - and the recording of data. 

''Setting: (A group of eight students was asked^to choose from a list 

the five most important issues between teenagers and 
adults, to rank order them according to importance, and to 
write a recommendation for at least the two most important 
problems, and for all five if they had time. They had 30^ 
minutes to complete the task. The only rule was that a 
• majority of the group must agree on anything tncy wrote. 
Two observers recorded individual acts of the g*:oup members 
as they discussed the issues, each observer rr;: >rding 
different types of behavior. At no time did the observers 
participate in the discussion.) 



O National Assessment ^f ^Educational FrogreST. Citizenship: National results. 
gJM(^" Denver, .Colorado;- National Assessment of Educational Progress November ^ : 

b™= 1970. ' -.. v.: ,^ ''lyO^ / • 



. . . List o£ Issues , 

A ge 13 . - ■ ^ Afte 17 

Time Limits (for being home, in bed, etc.) Censorship 

Home Duties. ' Curfew ' 

School Assignments ' ' Voting Age 

Adult Books and Movies Drinking 

Sports and Other Activities Smoking 

Dating and Partiesctivities ^ Working Rules and Laws^ 

Parents' Approval of Friends < Marriage Rules and Laws 

Money (where from and how spent) Auto Ina^jrance 

Dress and Appearance Dress and Appearance 

Smoking - ' ; • < . . Military Service - 

Swearing » ^ School Attendance 

Being Talked to Like an Adult ^, Civil Liability 

Criminal Liability 

— : _^ ' 

The purpose was not to find out how students 'ranked issues but to observe * 
the process by which they arrived at ranking decisions.* Specifically, the 
behaviors to "be looked for were: . ' '* 

• Took a clear po~sition / * 

• Gave a reason for a point of view ^' 

• Sought information related to the game from^other 
team members or from the administrator 

• Steered the .task by organizing the group or by " . ^ 
suggesting a change in procedure > 

• Defended the right of another group member- to be • 
heard or to hold a different opinion^ 

' ■ • Defended own viewpoint contrai:y to a* previous ' * • 

consensus • . ^ 

, . m\ Nontask behaviors ' * " 



D-.53 



The recording forms were something like this: 



School 



Locatio-n 



Date 



Time 



(Jbserver 1 



Gave a 
Reason 

1. 


Took a" 
Position 

S 

^ ^ .. ^ ^ ^ 


Opposed Group Alone (Q) 
Yielded (Y) 
, Convinced (C). 

1 . 




^ • 


2. 


•' 3. 




3. 


4. 


4." 


4. 


■ 5. ' . 


5. ' ■ ' 


5. 


6. 


6. ■ ■ 


c 

'6. 


~7. 


7. ■ • 


0 

7. 


6. 


1 

8. 


_ 5 : 

8. 



Each number identifies 'a given student. 



Date 



Location 




Tirue" 




o 

Observer 2 


« 


f 


/ 

/ 


Steered. Task. 


Sou^lit Information 


Defended Another 


• Nontask Acti'b^ 




S 


i 




* 

1. 


1. 


1. 


1. 


■2. ' . 




2. • • 


2. 


"3. 


3. 


3. 


3. 


4. 


4. 


4. 


4. • 


5. ' ' . 


■5. • : 


j "5 . 


5. 


6.- 


6. • 


! 6. ■ ■ ' " 


'6. 


m. 

7. 






7. " 




8- 


\ 8. 





• * 



Each number identif »lc .^i a given student., 

' t _ _ . 



D-54 

However, the general instructions to observers and the specific behav.. .-:s to 
be observed were more explicit: ' . . ' ■ 

* • ' ■ » . ■ 1 

General Ins trycr.i ons to Both Qbsp,rvf*_rs . ^ , 

. o 

1. O:\ly overt actions are to be recorded, not general . impressions 

• ^ .1 

2.. A sino' - 'vent or action may be scored dn more than one category. , 
^lany zc\>-.a^\ made by group members will not b^ scoreable.in any' 
category* ver. " ■ 

3. When the ..>;• .{T-c^ -^^-s— task, observers should . take positions in the 
I backgro >T.''i r;?- t. u-^ as members of ,che wording, group, so that the 

group wx.l., "^AOt depend -oa' the observ.Gts as moderators, leaders, etc. • 

■ rhe jbsarvars mur>v. be seaced close enough and in ^uch a way that they 

• ■ *. 

can easily identily who is tallying* So as not to be confused. Ly ^ 

; • y numberir.g, both ob-servero should probably sit w'^ere student No; 1 is 

• . ; ^ at thei-r iminedlaLe left. / • 

; 4. Reliable observatiion can be maintained only .by intensive effort, and' 

i " practice in use of the categories.* ^ Before each session, an observer 

'"' ' ' " ^ - . ■ • 

...... ...[ should review carefully the categories he is ro observe so that he ' 

' , , . ' \ * ■ — •- - .. . 

can keep Incisive definitions: clearly in mind at all times. Tryouts 

^ ' have indicated that it is all too easy for thp observer to err in two 

' directions in pa.rticular: .(a). The conc;2pt of thje behavior category 

is. loosened so that too maL':/, ivir .:propriate behaviors are included ; 

V (b; concentrating on ct-rciin categories, other categories are not 

i . att;ended to, - ad beha^'iors Lie ting rhcse ca tegocies ■ are thereby not- ' 

included . . 

" '^Iieneve an in-M :.:LLed -behavior occurs, the observer should make a 

check (V^) in L appropriate column on the line for the student who 
demonstrated i'\\^t b*ri: vior. Uith the exception of- the Oppose Group 
. Alont--^ category, a student is scores only the first time he'.demonr- 

- ^ ys the 'behavio r. uoservers will f ind^ that some categories will 
be scorev, Lor irost* students quickly In the f *ssion. Observers should 
then focus thei.r i.tertlo: mainly on those categories not yet scored" 
. • ' ..' . ' (Continued on 'next page) 

H ■ * ' • " . • 

\ ■ ■ • ' - 



and .on t- jse students who speak infrequently, so ^^s not to imiss the 

' . \, 

rare times these categories and students will be scored. There is no 
* ■ ' . \ 

need to. give' further attention to categories and students already- 
checked (except instances of "Oppose Group Alone"). ^ 

6. II in, doubt whether or not a particular behavior should be\ scored, 
• " ^ .. . • " ' ■ ~\- 

the observer should not score it. After each session,., any confusions 

..." ' 

about scoring should be discussed betveeri 'team members ^*d the\ 
oroitfct directoE called if necessary to resolve a frequently occ\rr^ 



\tig problem* ^ . . . \ 



\ 

\ 



Each of the at /en behaviors is .given in as much detail as th^-. general instruc- 
tions.- Examples of directior.s to observers for one of ^-bc seven behaviors to 
be obsexrv'ed Zollow: : ■ 



Directions to Observ er for Behavio r 

. ■ "Steered Task" " 

Score subj ts ir vhe "Steered Task" column on pFge D-A5 .for the kinds of . 

behavior listed below. (r not score nonverbal behavior which might seem to 

fit the category or an,utterance you are in doubt aBout) : 

■ . ■ - >.-,'• . " 

1.' Attempr:s to organize the task for the group or attempts to change some 

procedii/j for accomplishing the task.. .(Do not scorp when 1 tries to 
steer the group t':ward an incorrect ox^ irrelevant performance of t-he 
task, € —"Let's not worry about wrioinjf, anything down; let's just have 
a good disc j'^ST.i^n oa these issu^^*-.") ' ' 

\2. Notes the need f or. organization cc change in procedure. (Asking whether 
the proper procedure is or is not scored) ' ' 

3. Notes , the ne^d for. a chairperson. , ;^ 

4. Calls for a voti. or notes tht need for co.nsensus. 
5* ^Reminds others what the ma . i tafk is or what the rules are. "(Merely 

reminding othar'fe of chn i.r next step is not scored.) 
6. Tries to stop others from cutting lip^ or arouse? drifters. (Merely attempt- 
ing to c|uiet the ^ group is' not scored.) ' * 

/ (Continued on next page^ 



?• Notes that present discussion is on a tangent. 

8. Notes time^ priorities and stresses the import,ance 'of time in completing, 
the task. . • 

9. Volunteers or agrees to write down task products or expresses the need 
for such a recording. • 

10. Tries to mov^ the group on to the next Step and gives a specific procedural 
^reason for doing so. (Trying to change the topic or proposing a new to'plc 
to be discussed is not scored unless gives^, a procedural reason for doing 
: : so such as lack 6 f timle. Don' t score "What^s next?") 

^Eor those who may be. interested in results, this is what happened in the 1970 
survey: . , . u" 



Results of 1970 Survey 

. % Who Did This- 
at Least Once 
Age. 

* * ' 11 ' ,^11 

Took a clear position ^ 62% ' . 67% 

Gave a reason for a point of view » 67 79 

Sought information related to the game from 

other team members or f rom'the^ci'iministrator 54 55 

' ■ o . / 

Steered the t;ask by organising the group or ^ 

by suggesting a change in procedure ' 51 39 . 

Defended the right of another group member to - 

be heard or to hold a-diffetent opinion 4 .1 

"^Defended own iewpoint . contrary to a previoi s 

consensus ^ , 6 24- 



Cautions in the Use of Observation Instruments . Any time more than one 
person is involved in collecting data with an observation instrument, the* 
program evaluator mulst be concerned with consistency of those data. Standardized 



• ■ j. b O 



D-57 



instruments usually . have spc. If ic directions and information on inter-rater 
•reliabil -ty". Instruments thac are not standardized probably do not have this 
feature and the program evaluator must make his own provisions. In gene^ral, 
whether or nc;, the instrument is- standardized, it is good practice to. use the 
following procedure: ' ' . ■ . . 



3. 
4. 

5. 



Assuring^Inter-Rater Reliability on 
Observation Instruments 

Train raters in use. of instruments. 

Have raters use the instruments^on a group similar to o 

that they will be observing (field test.). 

Compare results of raters at field-test stage. 

If results are pot the same, discuss dissimilarities, 

and retrain raters, or revise instrument. 

Repeat steps 2. through 4 until satisfactory results are 

obtained. I . 



Selecting an Observation Instrument . When selecting an observation 
instrument, considet the following four steps: 

1. Define the factors on traits that match the program /evaluation > 

objectives . For example, if the objective stated that individual ize^d 
instruction should occur in -the classroom and that teachers shoul4 
use a range of equipment and materials arid- aides, all. these factors. 
^ Hhouia be covered somewhere in the observation instr.ument. 

\2. Id entify existing observation instruments and determin e that they 
. deal witih^'those factors . For example, use the Simon and. Boyer (1967) 
..' I edited- text Mirrors for Behavior , volumes 1-4 as a resource for- 

identifying appropriate 'observation items and formats or m-il<?. adapta- 
. tions from existing\instrumen'ts. (Anita Simon'.and E. Gil Boyer, 
Philadelphia, Research for Better Schools , 1967.) Or, refer to Good 
and Brophy cited earlier, in this section. 

. 3. Gauge the advantages and disadvantages of t he instruments^ Use . 
nuestions ^uch as the, following to help assess the worth of an 
instrument: ' : 



ERIC 



.rub. 



Content Validity: 

• What kinds or data can the evaluafcpr collect 
when using the instrument? 

• To what extent? are the data going to provixie 
the evaluator. with the heeded information? 

Reliability: ^ 

. ' ' . • ' 

• To what extent can the evaluator trust the 
data produced by the instrument? 

Technical Inf onna-tion: ^ ^ 

• Is there back-up statistical information? 
Scoring: 

• What kiTid of scores are generated? 
Usability: . ■ . 

• How long does it take to administer? 

• How much support equipment is required? 

• Are the instruct ions easy to understand? 

• How difficult is it to train someone in 
its use? ^ 

Cost: ^ " ' 

• How mudh will it require in resources ^, 
" (time/money/personnel)? 

{lemember need for: ' 

% Comprehenjsive set of instructions; and 

• Training for the observer(s): 



Learning Exercise 11 
D-59 



LEARNING EXERCISE 11?. CRITICIZING A CLASSROOM OBSERVATION INSTRUMENT 



Directions: Study the observation instrunjent on the next page an^ make judg- 
ments' about its adequacy. On the sheet following t^e instrument, 
recprd your responses. ; Think about how you would use it ajad the 
kinds of information you would get from it. ^VAcceptable" means 
you think ybu cojld use it and get useful information^ Consider . 

' -th-e following: a 

* * 
.,1. Identifying information: In six months will you know where, it came 
* from? ' ' " . . 

2. Scale points: Are they well-defined and functional?. 

3. Directions for use: Is it clear how the observer proceeds? 
A. Coverage: Are most important classroom variables included? 
5. Clarity and s.corability : Are iteins .to bev observ.ed clearly specified and 

free of ambiguity? 



■J 



I 



D-60 



luearning Exercise 11 



/ 



Classroom Observation Instrument 

Teacher_ 
Observerl 



7 



/ 



ERLC 



Students 

1.. Students begin work with minimal 
teacher direction. 

2. Students concentrate on cheir own work, 
with minimal distractions^ 

3. Students seek put s^ta^ff and other 
students for assistance. 



Staff 
4., 



Staff prepares materials in advance 
and is available before and after^ 
class. ' 
Staff interacts appropriately with 
students at their level, in conversa- 
tionaj. manner, and with enthusiasm. 
Staff operates in team-like. manner 
and assists each other as needed. 



Room 



9. 



/ 

Classroom zones and areas^ are well- 
■defined for stuBents and staff. 
Classroom is comfortable (temperature, 
visual displays, physical arrange- 
ments) . . \ 
Physical space is . efficiently us^d 
by staff and students. 



Materials \ 

10. Materials ,are clearly marked and 
available to 3tudents. 

11. Books and other materials are displayed 
to catch student' inter4st. . " \^ 

12. Adequate amount of materials is \^ " 
available for carrying out the* 
program. 



\ 



Pro^^^ram 

13. ' Realistic studfent goals are encpurajted 
and appear to be known, by tl;ie studfLUts. 
jL4. Record-keeping procedures .(attendance 
" and student progress) are maintained . 
and easily proyide information to 
the staff all. the time. . 

15. Student programs are checked and 
modified as needed. 

16. Some evidence of the purpose aad 
offerings of the- i^rogram can be seen . 
in the room or in the students' 
materials. 



RATING SCALE 



Good 



Adequa t4 



Below_ 
Ave. 



Poor 



! I 



.1 / 



y -■ ■ . 

"i 



N/A 



,1.^9 



V Learning Exercise 11 

. ■ \ D-61 

Criticiz-ing a- Classroom Observat ion Instrument 

.. - , RESPONSE SHEET . 







Acccp cable 










Yes 




If not acceptable, what is the reason? 
















J- • 


Tdpntifvino 

inf onnation 

i 












2. 


Scale , 
points^ 








* % ■ 






• / ♦ ^ 




» 
) 




\ 

V 












* % 




3v 


Ditectidns 
for use' I 


- 

■ / 


] 


! 


' ' ' \ 

4 

- A 1 




4. 


Coverage 
» 


•i 

1 

I 






•i 

'■ 

• . . , ■ 1 




5. 


Clarity 








1 

' . . ! 




6. 


Scorability 








V 





Learning Exacise- 1. 

\' , • ^ .r'l 

Criticizing a Classroom Observation Instrument'* / 
■ ANSWERS V - ■ ' " ' I 





Accep 


table 


<^ 

If not acceptable, what is t^itf" reason? 




les 


tip 


1. Identifying 
information " 




X ' 


Not enough space to write I«D,; no "date ' 
^ gi^en; layout unattra^ctive ^ 


points 




' 1 
yf 

I 


, c^M^o attempt to provide observer with frame 
01 rei erence y wnac is gooa ^ ^ wnac is 
"poor"t Labels don't seem to fit with ' 
iteins to be observed. 


o^r^Dir^ctious"^ 

. .fpfruse 

ii ■ . 




X 


■ ' — — *r — 7 • 1 "^""^^^ 

NoneQ^itist! How long does one observe 
.gin^|yter what conditions? What is the 
purpos^ of this in£.crument:? ^ ^ 


4. ■ C^^ra.zJ , 


X 

4 t. 






5^, Clarity 


/:•. 


- X 
'* 


: J ■ ~ ... 1 ' " 

* Too rauch is left to the interpretation 

of the observefr. ^at is ^'minimal teacher - 
direction^?" How can you tell if Mace is. 
"efficiently^ tjised^r' ^ . ^ 1. 


f 

6. 'ScoraSility 

1 • 






Sensible scoring/ system could Easily be . 
■ ' devised^ and directions i||cluded for 
- ''applying it. . Summary information could . 

then be taken off the- completecjj ins'ttuments, 
V . for . analysis.- 



O ther Behaviorij ... 

Of ten-an.evaluator can gain access data which are readily avaUabie and 
which do not require a iormai daia-col l.ect i on instrinnent:. Such data are 
called unobtrusijVe measures.- " . - • 

Use of these typos of measures is appropriate if an evaluation objective 
suggests that specific changes are expected and if available information 
shows* that these changes are .occurring. For example, the attendance rate of 
students will increase; the use of the reaiing lab' will increase; the grade- 
point average will improve; the number of cuts from a class will decr^ease; the 
number of • d isc ip 1 ine referrals to the principal's office will decrease, and so 
on. Data of these kinds can he useful the. evaluat ion design as additional • 
^ indexes of' program success. 

Uriobtrusive^ measures can also be used as indirect measures of attitudes 
and interests. For example, instead of asking a student, "Do jou ever, of 
your own accord, read -humorous stories or books of satixe?", you miglit check 
the school library to see what the circulat ion- records show for this category . 
To find out what science topics are most popular, you could look for pages in 
the ''encyclopedia of science that are v^orn, have thumbprin'ts, or are dog-eared. 
To find out if a new unit or program 4-s interesting to its participant, you 
could check absence records before it starts and periodically while it is in 
progress. 

The greatest advantage of unobt>usive measure^ is that the da^a-collec t ion ■ 
procedures do. not themselves influence the results. ■ Students may behave 
differently when ah observer ;is present -or when they are taking a^test^or 
answering a questionnaire.- The experience o^ taking tests itself may influence 
subsequent performance. But with unobtrusive measures, students are unaware 
that program-related data are being collected. As a consequence, their 
behavior in the prog ram is unaf f ec t ed . 



Metfessel and Mich (1967) have compiled an extensive list of unobtrusive 
indicators of student behaviors. An abbreviated list follows: 

Indicators of Status or Change in Student Behavior Other Than Those 

■ Measured by Tests, Inventories, and Observation Scales in Relation 

to the Task o'f Evaluating Objectives of School Programs. 

'I 

.1. Anecdotal records arici~"case Uistories 'c^iTical rincidents noted 

including frequencies of behaviors judged to be highly undesirable 
or' highly deserving of commendation 

2. Attendance: f rpqueacy _and duration when attendance is required 
^ or cousijdered optional (as in club, meelng, special events, or 

off-campus activities) 

* 3. Autobiographical data: beuaviors reported that could be classi-* 

fied and subsequently • assigned judgmental values concerning 
..their appropriateness relative to specific objectives concerned 
■ • with human development ' • 

•\ • 

A* Citations: commendatory in "both formal and informal media or 
communication such as in 'the newspaper, television, school 
assembly, claL.-room, bulletin board, or elsewhere 

5. Extracurricular activiuies: frequency or duration of participa- 
tion in observable behaviors amenable to classit ication such as 

taking part in athletic events, charity drives, cultural act ivi- 
es 

'ties > and numerous* service-relaf.ed avocational endeavoi^s 

■ . ^ . . ^ 

^6. Grade placement: the success or lack of success in being 

promoted or retained; number of times accelerated or skipped 

* 

7. Performance: awards, extra-credit assignments and associated 
paints earned, number of books or other learning materials 
taken out" of the library, products exhibited at competitive 
events 

• 8. .Recidivism by students: incidents (presence or absence or » 

frequency of occurrence) of a given student's returning to a 
probationary status, t9 a detention facility, ot to observable 
behavior patterns judged to be socially undesirable (intoxicated 
state, dope addiction, hostile acts, sexual deviation) 



Adapted from Metfessel, N.W., and Michael, W.B. A paradigm involving 
,.inultipj-e criterion measures for the evaluation of effectiveness of school ~ 
programs. Educational and Psychological' Measurement , 1967, 27\- 931-943. 

I 1 / 1 3 



Other possible indicators include:*, absences , appointments kept or 

brokun, assignments co^npleted, changes in program or- in teacher as - 

• ■ 't 

requested by student, choices expressed or carried out, disciplinary ■ 

actions taken, . number of dropouts, elected positions held, grade-point 

average, grouping, homework assignments ; leisure activities, library card 

possessed, numbers of units or courses carried, peer group participation, 

recommeridations or other referrals, skills, social mobility, tardiness, 

transiency, and transfers and withdrawals ^rorn school. ; 

indicators of Status or Change in Gpgnitive and Affective B ehaviors 
of Teachers and Other School Personnel in Relation 'to the Evaluation, 
of School ProRrams 

1. . Attendance: frequency of, at professional meetings or at inser- 
■ • vice training programs, institutes, summer schools, colleges - 

' ^ and universities (for advanced training) from which inferences 

can be "drawn, rega-rding tlte prof essional person' s ' desire to 
improve his competence ^ ^ 

; 2. ^ail: -frequency of positive and negative statements in writtea^ 
correspondence^ about .te^achers, counselors, administrators, and 
Other personnel 

3. MenbershipB, including elective positions held in prof essional 

and community organizations; frequency and duration, of .association 

A. R.ociug scales and checkj/ists (e.g., graphic rating scales of 
o the semantic differential) of teachers" behaviors in the class- 

room or of adninistrators' behavior in the school setting 
regarding •hjriges of behavior in professional competence, 
skills, attitudes^ adjustment, interests, and work efficiency 

. 5. Records and . report ing procedures practiced by administrators, 
counselors, and teachers; judgments of adequacy 'by outside , 
consul tan ts . ► . ' 

Other possible indicators include: article written; grade--point 
average; load carried by teacher;, moonlighti^ng; nominati(5ns by 
peers, students, administrators, or parents for outstanding service 
and/or professional coi^petencies ; termination; request for transfers. 



D-66 



Indicators of Community Beh avioi:. In Relation to the Evaluatio n, of 
School. ProRra nis ~ . , , ■ . * 

1. Aiumni participation: numbers of visitations; extent of invoLve- 
ment in PTA activities; amount of support, of a tangible (financial) 
or a service nature t^ a continuing sch^ool program or activity; 
^attendance at special school events, at meeting of ^fi'e boatd of 



ERLC 



education, or at other group activities by parents 

Conferences between parent-teacher, parent-counselor, parent- 
administrator sought by parents; frequency of * * ^ 

Letters (mail): frequency of requests for information, materia^^s 
and service; frequency 'of praiseworthy or critical comments about 
school programs and services and abou!t? personnel participating 
in them 



4. Participant analysis. of alumni: determination of locale of 
graduates, occupation, affiliation wi th' par ticular institutions 
or outside agencies • . ' 

5. Parental response to letters and report cards upon written or 
oral request by school personnel: frequency of compliance by 
pare nts ^ ' . 

0 

6. Telephone calls from parents, alumni, and from personnel ip 
conmunications media (e.g., newspaper reporters): frequency, 
duration, and quantif ifTble judgments a:)out statements monitoV.ed 
from telephone conversations 



Interview data 



Even though no formal instrument is required ,^ some device (logs-or 

summary sheets must be devised to collect unxDbtrusive measures . 

■ ■ • /' * ■ 

5. SOURCES OF INFORMATION ABOUT INSTRUMENTS 

Buros' Mental Measurements -Yearbook is the best-known resource for loc»iting 

published assessment instruments, but certainly not the only one. The 

«? . - 

references Listed here give fairly comprehensive coverage over a' wide range 
. of instrument types exccp L for criterion-referenced tests', which have been 
treated in some detail in the discussion on achievement tests. More cor.plete 
references .can be found in item VII, An Annotated Bibliography of Guides f9r 
Test . Selecci'on, of Section J in this Guide . - - , ^ • . 

>! . , ' " 1 9 u .V . , ■ ' - - . 



D-6'7 



Where to Locate- Information AbouL Assessment Jnstrum<^nt^ ' 



jpurce 



Euros' Seventh M ental 
Measurenent Yearbook 



Euros' Tests in Print 



Center fof Study of 

fcvaiuaTl*0'n~~aT-trni^er^i-ty- 



of Calif ornia at Los .Angeles 
(Hoepfner) 

Test Publishers' Catalogs 

Tests an I Measurements 
in Chil' i Development 
(Johnsc • and Bommarito) ■ 

Socioem otjonal Measures 
for Preschool and Kinder- 
garten children (Walker) 

Measures of Social . 
Ps ychological Attitudes 
(Robin?on and ^aaver) 



Jlirrors for Ee'ravior 
(Sijion and Boyer) 

ERIC Clearinghouse' on 
Tests, Measureoen-t , and 
Evaluation (Educational 
Testing Service) 



,-F.TS^ Test Collection. 



ERLC 



Profeasionai Journals 
Ehsjucational and Psycho- 
logical Measurement 

Journal of Educational 
M easurement . 

Journal of Counseling 
Psychology 



Personnel apd Guidance 
Journal 



Type of Information 



Critical review on. currently published standard- 
ized tests / 

'Comprehensive test biblloj^raphy and J.ndex to 
first six Mental Measurements Yearbooks ' 

Ratings on validity , . reliabixi r.y appropriate- 
Titrs-sv-titise~:ef~a<imi-ftis-t-r^;\:-i^ otc. , on pub-Lished- 
standardiz'ed tests ' • . 

* • • 

Newest materials (sometimes not found in 
Euros) ^ ' 

E;cperimental instruments in child development 
(self-concepts, attitudes, social behavior) 

Descriptions of .143 . tests and m^asuras of 
sociarl and emotional development (includes 
some technical information)^ * , 

C^ritical reviews of tests ^*§mostiy experimental)' 
ia 8 general (categories: life satisfaction, 
self-esteem, alienation, authoritarianism, 
sociopolitical attitudes, values, general 
* attitudes towarcf people,^ and religious attitudes 
" ..^ 

E X is tl n g 'oh s e rv a t i o n instruments 



Has aanoiated bibliographies of ^ests in many. 
ar6;^*s : measures of social skill;?, measures 
related tD school-basea attitudes, self concept, 
educationally disadvantaged, assessment of 
teachers, criterion-ref erented tests ^ ' 

A library of some 10,..000 tests and other o 
measurement devices representing the instruments 
of all publishers. Access is based. on guidelines 
of the American Psychijj^log.ical Association* 
"Address specific inquiries by mail or telephone. 
A quarterly Test Collection^ Bulletin is 
available on a subscription basis 

Reviews and validity studies of recently" 
published or revised tests 



D-68 



Use Multiple Measures Whenever Possible . ' 

It is sometimes easy ko shoot dowq a single measure of student achievement by 
discrediting its score for some technical reason. Credibility of program 
evaluation may be enhanced *by the use of Several measures of program effective- 
ness. The inclusion of attitudes of parents , students , and staff as well as 
unobtrusive 'measures of^ student behavior' wil 1 broaden the base of information 
..XriDm.j^M*!c.h_jadgineiTi.tJ5™Cau^^ ^ _ . 

Summary of Factors to' Consider in Evaluating and Selecting' 
Assessement Instruments 



1, l^e Liability 



-Does the instrument give "the same 
results when repeated? 



7. Validity 



Does the instrument measure what it .says 
it measures? Does the content match your 
program* objectives? Is it free of bias 
for different subgroups? 



.3. Content 

"4/ Administration Mode 
^nd Time 



Are the items related to program objectives 

Is the instrument admi.istered in groups - 
or individually / by interview or observa- 
tion? What qualifications does ^the 
ad.ministrator need? Arfe directions for " 
administration adequate? Is equipment 
required for administration available? 
Is time required reasonable for the 
, rt.sults exoected? 



5. Scoring 



6. Format and Interest 



Is scoring by . he; :id or by machine? 
directions for scoring adequate? 



Are 



What is the general' editorial quality? 
Will it hold the test talceir's interest, 
and are directions easy for- test takers 
to understand? 



7. Scores and Noms 



*Tf nprmed,~what are the characteristics 
of norm groups? When was it normed? 
Are interpretive aids available? 



/ 



ERIC 



8 . 



Learning Exercise 12 
■ D-69 



LEAJLNING EXERCISE 12: SELECTING' NORM-REFJERENCED TESTS 



Directions: Se/ect and check one of the four objectives list^ea below, then 
c/nsult the descriptive information on D-71 - Xi^ll. Select up 
tp three instruments* >oa think would be appropriate £or that , 
obiective and list them on the f'ollowing page. Record a "yes" in 
those boxe^ in which answers to the questions seem to be affirmative. 



Objectives 



i to." V 

Second- and. third-grade students °part in Che bilingual-bicul- 

tural program will have a mean score of 20 5r higher on the ' 

series of tests of cultural siTiilari- 



ties^.and differences. Qne test will be. given after completion of 
each cultural unit. « . 

Th^ median percentile rank in reading comprehension for third-grad^ 
students. participating in the remedial program will be eight points 

higher on the posttest given in May than on the pretest given to the 
■■ / . " • ■ 
same s tudents in October. The . test to b^e i^sed is 



test 



3. Kindergarten ^children at School Z with a 75 percent or better atten- 
dance will show nine months gain or" more in language usage on the 



language test after nine months' of ins truc^tion. 



A. All tenth grade students receiving remedial math instruction will 
^ sho w at lea st .a_JAve-month mean gain "in math comp.uLaJil on s fnr evpry - 
five months of instruction. Gain will be measured by the 



test 



At page D-73, judgments by specialists about the listed tests are shown. 



ERLC 



^70 



Learning Exercise 12 



Selecning' Norm-Referenced Tests 



Criteria 



1. Is the instrument 
a valid measure? 



^Is the instrument 
a reliable measure? 



.3. Is the instrument 
appropriate to use 
on the population 
to be assessed? 



Instrument 1. 
Name 



In.sttument 2 
Name 



4. *^ Does the instrument 
yield objective 
data? 



'Is the instrument 
easy to administer 
and score? 



.Are mvnimum time ' 
and resour^'^.s required 
to administer and e 
score the instrument? 







Instrument 3 
Name • 



Is ,the administration 
of the instrument 

_n ajKLisrup-tu-v e -c o-: ■. — 



cl/assroom learning 
activities? 



-Will the-_Lnst^*amcnL 

provide data wr.ich arJ 
useful for decision 
making at both the 

classroom level and*tiu 

■ ■ • ^ 

progara-administrative 
level ? 



Is the cost of the 
instrument reasonable 
and; witfrCrT budgetary 
constraints? ' 




Learning Exercise 12 

■ . ■ D-n- 



ERIC 



BRIEF DESCRIPTION OF TESTS . ^ ' 

American School Achievement Tests by Robert V. Young, et. jil. ; Level - 
/Primary I ^G^ade 1), Primary II (Grades 2-3),* Intermediate (Grades 4-6)," . • 
/.Advanced (Grades 7-9); Forms A.B.D.E; 1955-59 (BMC) ' ■ 

Subtest: ' Primary 1 :, Word recogniti^, word meaning, numbers. Primary - 

II ; Sentence- and word meaning, paragrajph meaning, computation ^ 
. pif'^oblems , language usage, spelling. - . 

Intermediate : Sentence and word --meaning, paragraph meaning, 
^ arithmetic computation," arithmeti/^problS-ms ,. language, social 

^studies, science. ^\ 

. . Advanced : Sentence and word meaning, paragraph meaning, 

arithmetic- computation, arithmetic problems, language, spelling, 
' - • social studies, science. " • ' , ^ ^ • 

CIRCUS*, anonymous; Age: A-5 years; 1972 (ETS) • 

Subtest; What wards mean, how much and how' many, look-alikes, copy what 
•* you see, finding letters -'and numbers, • noises, how words sound, 
how words work, listen to'the story, say and tell, do you 
know, see and remember, think it' through, make a tree, activities 
'ft. inventory, teacher questionnaire, test-taking behavior • 

Ccmpreheusive Tests of Basic Skills, ano-iymous; Level - I (Grades 2. 5-4) , 
— Level 1.1 (Grades '-4-6) , Leyel III (Grades 6-8), Level .IV (Grades 8-12); Forms 
' Q, R & S; 1968, 197.3 (CTB) I'orm' S only: A & B (Grades K-1), C (Gradas 
1.5-2). . ' • 

Subtest: Reading, language, arithmetic, study skills. 

T Cotjp-exat-i-ve-Pr-imary Tests .anonyiLOU^sj_J^^e\^^ Form B; 1965s , . 

•■(ETS) " 

Subtest: . Reading, listening, word analysis, mathematics, writing, 
skills. ■ . 

Durrell Listening Reading Series by Donald D. Durrell, et. al.^ Level - 
Primary (Grades 1-3. 5) , Intermediate . (Grades 3..5-6) , Advanced (Grades 7-9); ^ 

♦ . . * » » 

Form DE; 1969 (HBJ) . . 

. , # " . . .. ^ ° 

^ .Subtest: .Vocabulary listening, paragraph listening, vocabulary reading,. . 

paragraph reading;.' 



• 200 



Learning Exercise 12 



Gates-M^cGinitle Readincf^T. s' s by Arthur Gates, Walter MacGinitie; Level 
Primary B*^(Grade 2)., Priairy'c (Grade 3), Survey D (Grades 4-6), Survey F -* 
(Grades 10-12)';' 1964- r 9 (BEM) . ^ • 

.Subtests: Pr^u^^ry levels : Vocabulary and c.omprehension 

Survey levels: Speed, accuracy, vocabulary and^^comprehension* 
Mettopolitcui Acliievement Tests, 1970 Edition by Walter tf. "Durost, et. al. ; • 
Level - Prlniary I (Grades ^1 . 5-2 . 4) , Primary II (Grades 2.5-3.4), Elementary 
(Grade.^ 3.5-4.9), Intermediate (Grades 5. 0-6. 90 , Advanced (Grades 7.0-9.5')- 
Fjrm F & G;.,1959 edition, Forms A and B also available -<HBJ) . 

• ^ . Tests.: Primary: / Listening for sounds, reading, numbers. Fyimary I : 
Test 1 - Word knowledge. Test 2 - Word analysis. Test 3 - 
Reading, Test 4 - Matheu^atics. Primary it : Test 1 - Word 
' ^. ' knowldege. Test' 2 - Word Analysis, 'Test" 3 - Reading, Test 4 - 

Spelling, Test 5-7 - Mathematic computation, ^concepts,' . - - * 
problem solving. • _ - ' 

Elementaryj Test 1 -'Word kijipyledge. Test 2 - Reading, Test" 
^- . -'3 - Language, Test 4 - Spelling, Test 5-^ - Mathematics 

, computation,, concepts, problem solving 

- .■ ■ ■ / • ■ ^ - . . ' 

Intermediate & Advanced -;- "^Test 1 t Word knowledge, Test 2'"'- 
' . ' Reading, Test 3 - Language, Test 4 --Spelling, Test 5-7 - 

Mathematics computation, 'concepts , problfem" solving, Test ff* - 
Science, Test 9 - Social stVidies. 

# ■ 

Stanford Early S'chool Achievement Tests by Richard Madden and Eric Gardner; 
Level - I. (Grades Kvl-1.1), II (Grade l.'0-1.8); 1969- (HBJ) 

Subtests: The envi^rontnent , social studies and scifence, mathematics, 
letters and sounds, aural comprehension, word reading, 
sentence reading " 

.T^^ts of Basic Experiences by Margaret H. Moss ; Level ' - K (Preschool-K) ; L \ 
(Grades K-1); 1970 '^(CTB) . ^ 

Subtests: General concepts , mathematics, language, s^cience, social 
studies (Also, Spanish directions, supplemeht to manual 
available), 



•;S';tlonable ' 



American School 
Achieyenient Tests 
Primary I, II . 



c 

u u 

(A (0 

a <^ 



0 

^ > 



5« 

s 



(juestlonablo 



j Clrcys LeveJ. 4-6 
Years . 



i- 

'H 

C 

Otl 

0 



Comprehensive^' 
Tests of Bafilc 
Skills, Forms 
A.B.C 



Durr. 11 
Listening 
Reading 
Series 



Questionable 



Good ■ 



Good 



u 
u 

V) 

c 

> tI 
0 

j: 



c 

0 <D , 
d) 

« 0 

o'c 

0 

o'a 0; 

M (U 

aj: M 
u d) 



C CD 

4) 13 

3 

I- > 

C 0 

(1) X< 
0 

u 

(ft H 

0 (1) 
0 



Questionable' 



Good 



Questionable • 



c u 

Q) (/) 
3 C 

i1 

C c4 H) 
0 0 

U u u- 

(ft 

(ft C 



Good 



Good 



Questionable 
pn scoring * 



(0 u 

d). D I 

u U 3 a 

((} u 

0 c n 

0) 0 .H ^ 

I- -0 \ 

ti • £ S 

C 0 u 3 
'I *J B 

1) -^ u 

■ 

U -H (I)' 
3 U 

M U C Ci 

< U {(j;0 



C 

C *" 

ki 

U H H ^. 

4.i (fl- 
ea B <D 
•H 3 0 



0 CD 



0 U 

U 'H 

(fl > 

(fl 'rJ 

d} u 



1 3 

0 

& (fl a 

3 . 

C (fl c 

u 0 

(ft £ 0 

c :j 0 



0^. 

a co> 

0 0 CI 

^^ a 

(fl 'fl 

(0 0 0 

0 C U 



' U4 0 r*t U 
I 0 C U <13 . 



Q) li tI 

£ (U u 

U -0 (II 

; 

■ 'H -H 0 

? > w a 



fl) "0 



Good 



C'ood 



0) (4 14 

c *J 

u 0 « 

(fl 00 

003 

iJ« . (fj 

(fl lU t^-U 

0 C -H C 

0 u (I) 

^ 1) u 

U (ft 

(ft tJ. C 



0 
r 
u 



.; Questionable Ggod 



Questionable i Questionable 



Good 



Good 



Good 



Good 



Good 



' Good 



Questionable' | Good 
if scored ! 
by hand 



Good 



•Good 



Good 



Good 



*Good 



Good 



Good 



Gates HacGinltl** ; Goodvif^ 



Reading. Test' 



/ .jHetropoKtan 
/ 1 Achievement 



i Good 



Go.od 



Good 



matched with ' 
content ^ 
taught 



; Readiness 

I Skills Test' 

I 

^tanford 
Achievement 
I Test 



2U^ 



Stanford E;arly 
School Achlev- * 
ment Test c \' 



ERLC 



Test of Basic 
Experiences • 
leveU-L ^ 



Good 



.Goed 



Good 



Good 



I Good 



I Good ^ 



\ Good' ' 



Good 



Good, 



5ood, 



Good 



Good 



Good 



Good ' 



Good 



Goo4 



Good 



Good 



Good 



Questionable, 
on adminis- 
tration 

Good . 



Questionable ■ Good 



Gooi 



,CooH 



Good 



Good 



Gcfod 



Questionable : Good 



Good 



Good 



Questlonabli j Questionable ■ l)uestionable^ 



Questionable 



Questionable 
on. scoring 



Questionable, 
on scoring 



Good 



Questionable 



Good 



Questionable 



Questionable 



Good 



Good 



Good 



Good 




: Good 



Questionable Good 




Ratings were given by only two person with ei<l)erience in this field. The ratings nay dlfCSr greatly when reviewed by a larger numbe r" of ' persons 



D-74 Learning Exercise 13 

=■ . 



LEARNING EXERCISE 13: SELECTING APPROPRIATE INSTRUMENT-S 



Directloiis: Abbreviatbd portions of objectives are listed bJlow. Decide what 

.type of measuring instrument^ would be most appropriate to use for 

. " - _ 

: each;-' " Record thiB lette^^ "representing that type^ " 
t-nst ^'ument Type 

A. Norm-Referenced Test 

B. Cri.terion-ReEerenced Test 

C. Questionnaire 

\ 

I 

D. ObSjervation Record 

. c: . "^^^ 

^* enough information Is provided to nvike^ a decision. 

• s ' * ■ - . . ■ , 
. 1. Parents of participating compensatory education students wiil have 

^ posi{:ive attitudes. ; ; • 

^ — 2. TwecHty-flve percent of the students will achieve one standard devia- 
tion above tne national ijiean. . . 

, • ■ * . * / 

— __ ^* Students will interact in a positive social manner during class 

activities. . . 

; : . ■ Qiven^\ list of South American countries, students will be able to 

list thAcapitol of each country. ' " 

. ■■ / ^ \ ^ ' ' 
^- number of discipline referrals to/the principal will be , reduced''^ 

by 50 percent . ^ 

_ 6. Studerit.s. will check out books i-n category "A" more frequently than 

books in category "B". . . 



At thd;' end of the semester, students 'in the values clarification 
class will exhibit positive atitudes - toward their parents' ethnic 
background. . . ' ^ 



c 



At the end of the inservice workshop, teachers will.be able to answer' 
correctly 8. out of 10 cognitive questions based on content of the 
workshop.. . . ' . • 

Teacher 'effectiveness in promoting student" interaction will increase. 

The majority of parents with students attending schdol "X" will be 
Q ' -aware of t,he, auxiliary services available through the school 

ERIC • . . . 2. 



10. 



L . • • 



Learning Exercise. 13 
D-75 



ANSWERS 



1. C 

2. A 

3. D 

4. B 

"5'. E"" 

6. ' , K 

7. C 

8. .B 

9. D 
10. C , 



ERIC 



6, LOCATING. EXISTING ASSESSMENT INSTRUMENTS VS. 
DEVELOPING ASSESSMENT INSTRUMENTS LOCALLY 

/ 

Examples of very type of instrument discussed in this section exist somewhere. 
The program evaluatot who spends time ^searching-f or available instruments that 
wilJ meet his needs usually will 6'e far ahead of^ the one who decides to 
launch a school-wide or district-wide effort to develop tests, questionnaires, 
or observation records locally.. The development of good assessment instruments 
is a much more exacting and demanding task than is often realized. Question- 



even adaptation takes care.^and thought. Criterion-referenced tests that can 
be assembled for your purposes from existing items <ft:e becoming increasingly 
available through commercial sources. Jhe solectio-r o'f .'any type of instrument 

must, of course, be doqe with care. * 

- « 

DevelopinK Instruments " 

Unless you have highly trained technical staff and sufficient time and money, 
the development of instruments loyally should only be undertaken as a last 
resort. Tim^ and costs will vary with the magnitude of tlie job. Development • 
of a reasonably straightforward achievement test should take eight months to 
a; year to develop and a year to review, field test, and revise. Tjie" adaptation 
of an existing instrument for local use may .be done in cbnsidetably less tinfe, 
but even so, field test, review, and revisions steps should be given. time to 
run their course. • ^ • 

People who write items, whether fc).r achievement tests, questionnaires, 
observation, or other measures, must know the content area to be measured and 
must know basic techniques for item construction. In most ca'ses, it is easier 
to train a content person in the art of test construction than to take a 
professional item-writer and teach him/her the content area. liowever, it is 
sometimes advisable to get persons with the two separate skills and have them 
work as a team. In any case, the content person shoulc* know the basics of 
good, item construction. There are easy-to-follow rules in any basic test in 
measurement and evaluation. (See selectoJ bibliography at the end of the 

Guide starting at I-l. ) - ' 

" ' ' 

The development) of an evaluation' instrument begins with a plan tihU: 
specifies the information wanted. ' Dotermin;:* wha.t qitestions you want rjnswers 
to. Program objectives serve as thu basis for planning measnremenl need:;. 



•D-77 



Items are then Written. It is good practice to develop more items than ' 
will be needed in the final instrument because soAe will be los t through ^ 
review and f ield 'testing. . ' 

One of the greatest difficulties in constructing objective items is in 
:.'"gSting plausible options. In the case of achievement tests, there must be • 
.one akd only one be.st answer. The other options (or dist ractofs) should be . 
~ pi au s ib le answe r s Tor per sons Who do-no t know "thi^righ t -ans we r. In_^ th(^ 



development .(..cst-ionnav 3 items, it is often impossible to anticipate all 
' l.ppropriatc choices a person could make. A preferred procedure is to first 

•"'''"aL7nrs'tir'7tIms"Titi;er"i 

• from field tests to set options. Alternatively, an item writer migh't try to 
guess what these optians or choices are. For example, if you are designing an 
item to determine professional growth among staff members, you might guess at 
- some possible activities, and thin allow for an'"Other" response. 
Example:' 1. In which of 'the following professional activities have you 

- ' participated in the past year? " , 

Enrolled in college or university course 

- Attended special workshop sessions 

• - .Observed in other classrooms * 

Done independent reading 

Consulted with specialist 

" Other ^ : 

For achieveir^nt test items^'a completely .open-onded fonr.at should be used . 
if there is any doubt about being able to anticipate i-ood dtstracfors. In 
such ca's.^., he most frequently giVen wrong answers "provide the best distractors. 

Fi-eld, testing should include all concerns related to the collection of 
data : . . ' • ^ • 

1. Are there adequate proccdures^f or trairjtng persons who will 

collect the data? ^ . ■. 

,3. Are the directions for administration clear and understandable? 

3! ■ Ooe.s the instrument Itself give the kind of information you are 
seek ing '• 

. ' • A. -low "loiV^.^ does tuhit administrnLlon take? . ' , 

3, Does the scoring key work? ^ 

. - , - . 2-,, 7 ' • ■ . ■ 



The answers to such questions come from various ^sources and include both 
"hard," and "soft" data. There are- statistical procedures for analyzing' the 
data from the instrument itself, but getting oral responses by interviewing 
participants (both data collectors and persons tested) in a field test may 
also be necessary. This may be by group interview and should -cover such 
points as clarity of tisks, ambiguities in' individual items, and the actual 
me-cii^an±cs_ .of.. .dar^^j:iQl : . 

Obviously, the gr^up on whom you do the ^field test should be similar to 
the groups on whom you expect to use the instrument, but should not consist of 
members of that group. - • 

Evaluators who find they must, produce locally developed tests ^are strongly 
aflvised to.;,8'eek help from measurement specialists. The essent,ial stepg in 
ins trument ; develop me iTt are, outlined below. 



- * Activity 

Develop plan. 
J Prepare draft. 



Have several persons review 
the first draft. 



Develop directions for 
.adwinis tration. 



Develop scoring key 



■ Field test and prepare draft, 
i including t^he directions 
I £or administration. 



Revise, the instrument. 



•/J 



I Repeat review, field-test, 
j and revision, steps as 
j necessary. ' 

r . ^ 



las tructent Development Procedure 

Questions to be Answered and Cautions - 

\^at information do yOu want? 



Developers should be able to^ Write well 
and have knowledge about program content. 

Are there ambiguities , omissions , and 
unnecessary pieces of information 
requested? ' \ 

Try to assure that data will be colle'cted 
.under standard conditions. 

Can each part be scored, and is there 
agreement on the scoring? 

Administer it to persons likjp thovse on ' 
whom you plan to use it.; ask them to r 
criticize it; get time estimates. 

Uhat went wrong? Fix it. 

Arc you satisfied that the information , 
obtained from this instrumeat will 
answer the originl questions you wanted 
to- have answered in the first it^m 



The preceding are management steps and do nothing Jjo assu^re techViical 
adequacy. Validity studies can be planned which use an outsisde criterion you 
believe to be independent cf what you'are trying to measure with your instru- 
ment. Foif example, if you want to validate a .test of reading comprehension-, 
ypu might do any or all of the following: 

1. Ask teachers to rank students in order of their respective competencies 
'In reading comprehension. - * ^ 

2. Ask parents how well their children can read and' understand newspape'r^ 
items . ^ ' - ' ' ' . 

■ ■' ' 

U.,......,,..J*.....AaK...]tmchexs proportion of their students will pass 

each item on the test. 

Checking for reliability is more involved and "beyooci the scope^of the Guide . 

. ' 7.' REVIEW " \ 

^ . ' <» 

As you have seen, there are-^seyeral kinds of '"instruments that can be u^ed for 

■ ■ . * ' . « *' . 

program evaluation. Wlvich one(s) yoii select will depend to a great extent upon 

the nature of the evaluation 'design ai^d the kinds of- information th^t need to 

.be collected. . , . 

■ * .. . ■ , ■' 

- ' '* * " , , * " ' • 

Here is a review of what we have covered in tlfls . section of the Guide ; 

1. It 'is important to match instruments to pno^ram objectives. 

2. Multiple measures for each prbgram objective are desirable. 

3. 'Different techniques' and various types of instalments can be 

uafd. . ' ' ■ ; ■ . ■ • 

4. There are both advantages and disadvantages^ to using different ' 
Instruments and data-collection methods . ( > 

5. There are many sources of information on existing instruments. 

6. It is generally better to adapt or adopt exist ing. instruments ^ 
than to develop new ones locally. 

7. The- development of adequate Instrumcats locally is a costly, , . 
time-consuming, and demanding task. 



SUM>tARY OF 'BASIC EVALUATION INSTRUMENTS 





Types of 




Primary. 


Primary 




Ev^luaClpn 


Categories of 


St ructure 


Kinds of " \ 




Moasurj.cr> 


Tnsc runouts 


of I t ows 


Scores 




— —i-f ■ J- — -—^ 








f 


Norm-Referenced 


. Objective 


Raw« Scores^ 


'-"^ 


_ Ac h i e y erne ri t j. 


_Tdg^ts " 


. True/False 


Grade Equivalents 






Oriterion- 


. .Multiple 


Percentiles 






Referenced^ Tests 


Choice 


Standard Scores 


. . ■ / 






. Matching 


Stanines 


y . 








Percentages 








Open-Ended 


-> 

Categories 


/ -. 






Objective 


Frequencies 


PROGRAM — 


AttiTude 


O^iestibnn aires 


. Yes/No ' 


Percentages 


OBJECTIVES 






. Multiple 


Ratings 


- \ 


\ • 




Choice 










. Ratings^ 




\ \ 


*\ 

. — — , 


> 


Mixed 




\ 










y 




Observation 


Objective 


Frequenc ies 


■ A 


Interaction 


Record Forms 


. Ratings 


Percentage 


\ 




5 


. present/ 


Ratings ' , 








Absent 


Time 






Logs 


Open-Ended 


Frequencies ^ 






. Referral Raport.s 


Objective 


* ti 
•Categories 




Other 


. Attendance 








Behaviors 


. .Cuts 










. Grades 










. Diaries 







1 



PROGRAM EVALUATOR'S GUIDE 



Section E 

COLLECT THE DATA . ' ^ 



"PTtie Evaluation Improvement Program 



ERIC ^ ^ 



PRECIS 



After available instruments have been selected or plans made for developinj^ 
new ones, the, evalua tor -must then plan how to collect the, data. Collecting 
data is a sensitive part of program evaluation, because its success depends 
so much on the cooperation and often the hard work of others who are not 

connected w ith the ev alua tion t eam . Moreover , the logistics of moving 

evaluation instruments and data from place to place and the complexities of 
schedules involving hundreds of people present challenges not encountered 
-it^ •'-other- •pab't'S''''t>f- e - -pr-ogram— eva 1 u a4;-ion--|>i:o oes-s 

Planning., for data collecting involves: specifying the subpopulations 
that a»re to serve. as sources of information, deciding on who will be 
responsible v£dr v\Qoilecting the information, and deciding whether the 
collection will be carried out on an individual*, or a group basis. Special 
arrangements need to be planned to follow up when pfeople are absent from 
group sessions or when individuals do^ not return questionnaires. 



CONTENTS , . * . 

• . . ' Page 

1. INTRODUCTION . . ....;....£-! 

.4 • ^ ' 

h 

2. BASIC CONSIDERATIONS . .......... E-1 

Arrangements with School/Program Personnel . . • E-2 

Personnel Who Collect the Data . E-3 

Tra ining Nr>ed»>d fox. TbQ$.e„Whp.. Wi ll._Cplle,c^^ . . . l^-A 

Time 'Schedule . . . ' . . / - E-5 

Monitoring the Data-Collection Process E-5 . 

.'.3. THE MECHANICS OF DATA COLLECTION E-6 

.Collection of DatS from Groups of Perspns" . . * . i . . . . ^ E-6 
• Collection of Survey Data ; ,E-10 

LEARNING EXERCISE 14: PLANNING FOR DATA COLLECTION . , E-12 

4. MONITORING AND RECORDING DATA ON PR0GRA>1, ACTIVIT lES AND 

CONDITIONS . • • -,^-15 

Formative Process Evaluation . . ^. . . . . . j .^j;-15 

Formative Context Evaluation • • E-15 

Procedures for Collecting Proces^s and Context Data . . . . E-16 



o " • ■ . ; , • ■ 2 ■ ■ ■ 

ERIC 



, 1. INTRODUCTION 

/ 

No matter how reliable -and valid an instrument is, its usefulness can be 
completely destroyed by carelessness in Che collecLi >n . ."^nd handling of the 

The problems that can arise during the data-cpllection stage are many 
and varied." Among s^e o£ the more common arr n:se; 

• Reailing s'cores ol>tained the day b^fo/e a ^ 
vacation may not be comparable to those 

' obtained a week earlier. 

' • Response^ in an. interview s^'ituation may be 

influenced, by the race, sex, or status of 
the interviewer; <* 

• Inr.errruptions or faulty directions can 
"destandardize" a standardized test. ' ' 

• Voluntary responses to a trailed question- 
naire may not be reprei5entat'ive of the' • ^ 
total population. ^ 

In addition, aay data-collection plan must taRe into account a variety of 

a 

logistical problems, t^{e importance of which increases geometrically as ,th& 
number of student55, teachers, and schools involved in the program evaluation 
increases . • 



2. BASIC P0NSIDERi\TI0NS ' ' - " 

Some of the most imporcani: considerations in planning for data collection 
concern: 

• arrangements with school/program personnel 

• personnel who will collect the data 
• • ' training needed 

• time schedijlfi . ■ 

• monitoring the total data colLectLon process 



2, . 



ArranRemejuts with School/ProRram Personnel ' 

* * ' ** . 

In the several stages of collecting data, the- evalu.itor needs tl]e 

cooperation of schaol personnel. For example: « . 

«♦ ' . .ft 

• The evaluator may need lo obtain from / 
the school certain records and lists of ^ ' 

•st.udents, classrooms ^^leachers, holidays, . 
faculty meetings, materials, etc., to ' 
select appropriate instruments and 

populations to be included in the ^ ; % 

program evaluation and to carry out ' ■ ■ - 

other plans. ^ - ^ ...v-....^i • ^_ __ ' 

• The evaluator may need to train school ^ 
personnel in the administration and; use ^ ' 

^ - ..M„..!sy.aluaJJ.aa>.Xris.tr.um.eats.lta,-^...^...s.^ 

ardized' tests, questionnaires, Surveys , 
checklists, etc.). 

• The evaluator will need permission and ^ ^ , 
active cooperation in collecting data . • . 
in the school. 

' ■ ■ «• ' 

The kind of cooperation received often depends on how"" aware school, personnel 
iire of the benefits for students, teachers, ^and administrators expected to 
result from the pro&ram evaluation. 

• the evaluator is able to explain the purpose oF the^ evaluation and 
how feedback from it can be used, school personnel will respond more coopera-** 
tively. This understanding is the key to obtaining cooperation. 

There are several ways to create a favoraDle attitude toward evalua- 

•tion. The evaluator could convince the pri,ncipal of the school to reserve 

1' ' . . ■ 

a column ici the school's newsletter for periodic .reports on the program and 
.lj? evaluation. The evaluator might meet, with the participating teachers to 
discuss: the expected outcomes of the -evaluation, the types of information 
needed and why that information is important and necessary, and what they 
will get for their efforts. Tho. evaluator might also make personal contact 
with ^ome parents of participating students to brief them and, to help organize 
a parent-evaluation committee to help disseminate information related to 
evaluation activities. 



2 i ) ' 



Pe rsonnel Who. Collect the^Data .. 

* • • • • 
On§ .of the decisions -to be made in planning for data collection concerns 

r.hose who will do the actual collecting. Should they be teachers?^ Secre-^ 
tarie:;? Students? Should they be people from inside, the program or-someope^ 
not involved in the^ program? How much working time will be reqjjired? How 
many people? , • * 

In a program evaluation involving large-sc'ale standardized .testing, all 
teachers may be assignee! to^dmini^ter tests in their classrooms simply 
because they are the only available staff resources and physical facilities, v 
..Xn .^a ...5jnall.ex:.:px o^xam.. e va^^ 

by a small team of teachers, or "teams of teachers and aides.^ In any case, 
test administrators will need -•ampl e^ or ientation and training. ^ • . 

fhe evalwator must also decida whether the persons collecting dafici 
should be from the program being evaluated or I'rom* ou^ . :j / u-: program. 
Some authorities have suggested that data c ^llection should be carried ont 
by outsiders unfamiliar with the objectives of the program .to bring a totally 
unbiased viewpoint to the testing situacion>. The use of 'outsiders may 
serve* to prevent tv/o types of bias: r . ; 

? ^ ^ r ■ 

.1. Program personnel may have ve';5ted interests in the 

p;:ogram.' They may tend to f.?'cus on those aspects ' ' . ' 

of the program which' "are successful. They may 
cinterpret the data in a favorable manner, whether ' " 

justified or not. ' . - . 

2. Program personnel may. be particularly attentive to 
program objectives and might plan , their evaluation 
accordingly, overlooking effects of the -program 

that are obs'^irvable but unanticipated. . ^ • 

J EXAMPLE: Progri3,mmed instruction in 'mat,hematlcs .> 
^ often has the side effect of improving* reading 

ability. Narrow pro'gram evaluation would focus on 

mathematics achievement and might not uncover the , 
positive effector reading achievement. 



*A statewide survey in California conduttca in 1975 by Evaluation Improve- 
ment Project staff shewed that most program evaluation data are collected 
h'f classroom 'personno I . - 



Training Needed for Those Vlho Will Collect the Data 

'T^ evaluat^.^should- assess the type and ^amount of training^equired to 
administer the. different evaluation instruments and plan accordingly. ^' 

' • * ^ " ■ ' . . ■ 

■^^In the case of administering a standardized t-est, the basic requirements 
• ■ ' •» ■ 

for data Collectors are' a willingness and ability to follow .directions 

precisely. ^.^rainirigV-fa^^ task will probably pot need' to be^ extensive — 

a brief orienticion . supplenJeati^d by a checklist for use during test' adn.nis- 

tration. Hbwever, do not underestimate . the nerid to periodically* remind • test 

administrators of what good t-es t Ingacoriditiuns consist of (propfer rbom- 

environment, . techniques of distributing and collecting Ajiaterials, monitoring 

studeritj^ , noting conditions .that raay invaiyat^^^^^^^ 

not make the mistake of thinking that anyone can administer a 'standardize * 
test. Be sure to specify what should be done if a pe^^soa-^s absent. ^ - 

If the evaluation involves the int^rylewinjg of parents- using an ' 
interview guide,, the decision as to who will administer that , instrument is 
more critical than for a standardized tes t because-the validity and reli- 
ability of the data collected may be substantially influenced by the 
personality ' of the interviewer. 'Training in. this case-would be m6re 
"'tax.tensi^'e,, , ^ ^ v • ' 

VAien observation instruments are used, as in our reading program 
example, the evaluatbr needs- to carefully study the instrument, arrange 
several pilot observations in' a nonparticipati^ig; school, ^nd possibly , 
'train a second person' to participate in the- pilot observations -for purposes* 
of comparing tT?e data collected by two observers.* i ' 

Plans should be made for both training and practice by observers. 
This is especially true with instruments'^deveipped locally ./. -When observa- 
tif)n instruijients require judgments, some <iheck of^ validity and reliability 
of the data collectors' ratings should be made during pilot situations. 
This should Be done when the. instrument is being de.valopcd and early enau^lv- — 

to allow changes to be made prior to use.-^ — '~. ' 

*• ' . ' - / • 

'Thus, the training of those who will participate in Vhe collection of ^ 
data ia very important. In effect, if the data collectors arejnot carefully 
trained^ the; evaluatdr may have no data to analyze. 



Time Schedule . ' ^ 

Th^ schedule forpdata collpction- will ,be partly derermlned by fhe evaluation 
design .and by the ^teadlines fdr analyzing and reporting results. It will also 
be influenced by other .factors ranging froiii school calendars -and vacation 
days to. curviculum plans aad grading periods. The schedule- should brf as 
detailed as pos'sible- including such things as training sessions , testing- 
.room preparation^ space adjustments," materials delivery, >as well as the 
actual data-collection activities'. Do not schedule nesting sessions just 
befor^ or af ter holidays or in close prpximi^y -to major school events, 

. The evaluator should coordinate, prepare, and issue a data-collaction 

schedule for each evaliia^tion event, ' This schedule, should indicate at 

least the following information: ' \ 

■ ■. \ ^ 
- 1, ^WHEN the data are goia^ to be collected j ^ _ ' 

(e.g,, datesu-. October 22 from 10:00 a,m; V ' - 

to 11:45 a,m. ) / 

'2/ WHHlfi the data are to be collected '(e,g', ,c 

Classroom A>) ^ ^ . 

3, ^.r^HO'is going to. collect the data (e,g,, 
' the names of the persons responsible at 

each location) 

4, WHO is to be evaluated, (e,g,, names of 

the students or others) , - 

5, HOW the data will be collected (e.g., 

• * name of the, instrument [.s] to be used). ^ - 

" ■? ' * . ■ ' 

Ho nin6rinR 'the Data-Collection Process 

And finally, -there must be assurances that thgt plans are carried out as 
expected,, instruments ar^ applied properly, acta are recorded accurately, ^ 
absentees are accounted for pr oper ly , atid that steps, are taken to correct, 
or note untoward incidents which mighty bias the results. Monitoring can 
help assure thac^factprs^ are measured in the ways you intended t'oi - • 
measure them. 



Procedures that are carried out careiesslpr during data Collection may 
result in the measurement of extraneous factors such as the clarity of the 
directions given at tlie start of a test rather than of what was' learned as a 
result of the program. Failures in data collection can jeopardize the entire 
program~evalua,tion effort. ^ 



Group Discussion 

t 

What experiences have you had in co^ecting data that could help others? 
What are the littLe things that can trip you up? Mati^rials not arriving in 
time? Wrong materials distributed or not enough matcrialv? Has secur;.ty 
been a problem? . ^ 

How would you have prevented the situatioji described in the ancedote 
below? 

The principal of a small elementary school i\ led that the 
answer sheets froui one particular class were, only about half 
filled. A large portion of the class had completed only about 
half rhe questions. Upon questioning the teacher who had 
administered the test;, Liie principal found that a stop watch 
had been i^sed to time each secLion of the test. However, this, 
particular stop-watch had a sweep-second hand that revolved 
' twi-ce lor each minute. The teacher had inafdvertently read one 
revolutio'n (30 seconds) as one nftnute. Thus, the entire test ^ 
had been administered in half the time it should have taken. 



\ 

\ 

\ 

3. THE MECHANICS OF DATA COLLECTION 



Collection of Data from Groups of Persons 

» 

Most data on students are gathered in a classroom setting' (or in largo units 
such as the schgol cafeteria or auditorium). When da.ta are gathered under 
-such concentrated conditions, it is far easier to control^ the situation., and 
get valid data than it is in a survey. Organization', and attention to detail 
are the keys. ■ - ^ ^ . - • ' 



Except in the smallest of schools, the colioction o.t data from groups 
requires cooperation among people and coordination of activities by many 
persons.- The respective roles of the program evaluator-and che-cla-SBTomT- . 
teacher (or other persons responsible for administering tests and question- 
naires or collecting observations data) must be clearly . recognized and 
, combined if valid data are to be gathered. ' • « 

The follwing Checklist for the Program Evaluator and Sequences of 
Activities for the Data Collector specify the steps which must be Laken m 
implementing an effort to collect data from groups. 



: 



Checklist foi the Program Evaluator; 
Collection of Data from Groups 



ERIC 



1. Instruments have been, delivered to th»^se administering them well 
in advance of t\\e time they are to be usedt. 

2. Quantities , levels, and forms of instruments have been checked 
against actual needs. . . ' ^ ^ 

3. Storage' in 'secure -places has been arranged. 

4. Instruments and accompanying manuals and^ other materiaJ^s have 
"\ been thoroughly reviewed so that data collectors can, be trained 

effectively." 

5. Persons to administeir data-collection instruments ■ have been 
carefully selected and trained and provided with cheir own sets 



"of materials in advance. 



6. A detailed schedule has beei: prepared and distributed. 



7. Classroom distribution and collection procedures have been 
carefully worked out. ' 

8. Data collectors have been instructed what to do about absentees 
durjing testing periods. ^ ' . 

9. Specifications and, arrangements for scoring have been made. 

10. If scoring is to be done by an outside agency, answer sheets ' 
have been checked for completeness and organized for processing. 

1 ' 

11. Tesit books and all o^ther materials have been returned to. secure 
storage a^ter use.^ 



2z 



V E-9 



. Sequence of Activities for the Data Cpllector ' 



Before data collection: i : .. . * 

1. Study 'the directions fbr administration, ex|anine assessment 
ins truiaentLS and answer sheets. | ; 

■ . 2. Rehearse 'process of administering instrumei^ts. 

. 3. Clear up any potential jproblems with the program* efvaluator . 

■ ' ! ■ ' •• 

During -data collection:. \ 1 

> • " ■ ' i ! ' ' " ' 

"1. Prevent disruptions from outside sources (la TESTING — ^DO NOT 

. ; ■ j . 
DISTURB sign is recommended). Make sure room environment is 

comfortable.' i j 

2. Make announcements -slowly and clearly. Try to motivate participants 
without causing anxiety. 

3. Be sure each person has all materials and 
, 4. ^llow sufficient time for each person to 



ing information (for youag chij-dreri',! the 



equipment needed . 
fill in required identify^ 
data collector may need to 



EKLC 



do this stet in advance). i. j 

5. Use exact ' word irtg given in printed insttuttions. Do not improvise 
or use short cuts unless directions for administration allow for 
variation. Be sure each person undeptands what he or she is to do. 
• 6; Once testtng begins, walk around the room (to be sure everyone is 
working. Do not answer questions related to test content. 
('l)o the best you can" or "skip that one apd go on to the next" 
can be used as a response, if necessary.) 
7. Stop immediately when time is up. . 

After data collection: ' ; i 1 

1. Collect answer sheet's first, then booklets. 1 

■ . ■ \ \ ' 

2. Count all materials to be sure none is missing. 

* . . • • . . ,\ . \ . 

... 3. Alphabetize and check papers ^igainst group roster. 

4. Check all papers for completeness of identify|Lng information. 

5. Prepare an exceptions list. (Did anyone become ill or leave- 
. - " ' • - . - ■ ■ ' 1-'" 

the room during the session? Was there an unexpected fire 

_ ■ . ■ ' ■ i 

' .drill during the session?) Any condition that: could potentially. 

invalidate the results of one person or the entire class should 
be noted . ... 

6. Arrange and organize for scoring. 



Collection of Survey Data 



IThen data are. collected at a specific time and at* ..a speci £ic place 
(i.e., in the classroom, after an. inservice training session, at a me'ot'ing 
of the parents' groups), the program eva-^uator has greater control over 
conditions during data collection. However, not all data can be collected 
in this manner. The field survey, for example, has become one important 
method of obtaining information — particulacly from parents and community 
groups. The conditions -under which data ar^e collected from a mailed survey 
are subject to very little control by the program evaluator. However, there^ 
are a few things the evaluator can do that may heilp . increase the percentage 
of returns and usefulness of responses. ' • 

1, Keep the survey form short and to the point. Ten-page 
questionnaires o'f ten go. into the roun^ file. 

2. Make your cover letter of instructions clear and concise.' 

" • Explain why this information is important and why the * . . 
recipient of the letter was included'' in the survey ^:f^' 

' "v^^ Moti\>ation is critical in getting responses, to mailed 

' ' . \. • • 

surveys, j 

3» Encourage the respondent to complete "the questionnaire 

at one sitting and v;hen he or she is free of interruptions. 

4. Follow up the questionnaires to increase returns. It. takes 
, ; 75 percent to 80 percent to give you unbiased results. 

T^^is lasts.point is so important that it 'calls for a more detailed discussion. 

How .do j»:ou obtain a hi^h percentage return on surveys ? A high rate of 
return on' ^uj:Veys comes from a high level of effort- Mailing survey forms 
or sending them home with students and waiting for the.sresponsSes to come in 
do^s not involve a high level of effort. Most persons who are going to 
respond will do so within the first two weeks after receiving the forms. 
FpllQwxft^--ttp--TTn'TTiet^^ essential. 



ERLC 



2 



The foLLov. up may be done us'ing a number of techniques , but generally these: 

1. f^otices in the press and on radio during nhe firsr two weeks 

2. Sending a .ce^^^^ and another copy of the questionnaire to 
1. ' nonrespondents near the end of the second week 

3. Organizing some system of personal contact to try to reach 
Xionrespoudents and invite their cooperation 

c 

• To' take on. Che job., ct personally contacting all nonrespondents is 
generally not' practical for tjie program evaluator. Consider what the other 
resources are. Are there school personnel who can be given lists of persons 
to call? ■ Are there volunteer parent, student, or community groups whose i 
aid can .be enlisted'^? Are there "t.olephone trees" already organized by 
parent groups that might be used? 

The amount of tollow-up effort that can reasonably be expected is 
related to the number of persons in the original sample and, , in turn, - ttje ^ 
number of nonrespondents who must be contacted. This argues for selection ; 
of^ the smallest sample feasible to achieve the desi'rcd result. , 

If the number of pgrsons to be surveyed is not prohibitively largn, and 
if the evaluator can organize and train a small cadre of persons, a straight 
telephone survey rather than a mail survey could be planned. Each telephone 
' interviewer would fill ouf a structured interview guide for each call niade. 
This generally gets quicker results and a higher response rate.- HoWever, it 
may also mean making phone calls at night for a large portion of persons 
surveyed. r 



E-12 



LEARNING EXERCISE 14: PLANNING FOR DATA COLLECTION 



Directions: Assume you are the evaluator for a program and the decision 
has been made that evaluation data will, be collected from 
students, staff, and parents, using a variety of assessment 
instruments, If possible, put this in the context of an ^ 
ongoing program you have planned for next year. Columns "(1) 
and (2) on the next page give general background on the types 
of instruments that have been selected and the populations from 
whom the data are to be collected. ' . 

In filling- out the PI an for Data Collection on Page E*"13, 
consider the questions: • . 

Column (3): Who collects the data? — classroom teachers, program evaluator, 
other designated staff members , volunteer parents, independent 
observer j other ' ' 

Column (4): Will the data be collected from the target population in groups 

or will this be done individually? 

■/ ■ 

Coli.tran (5): What will you plan to do about, follow -up? If persons are 

- absent from a group session or do not respond to a survey, 
what provisions will you make? I 



I ■ 



ERIC 



^ <\ 



Learning Ekercise 14 



Plan for Data Collection 

• • ■ . ■ \ 



(1) (2) (3) W . (5) 



Type of 
Instrument 


Population 
on 

\-/hora Data 
Areo To • Be 
Collected 


Person 
Responsible 

for 
Collecting 
Data 


Group or 
individual 
Activity 


Type of 
Follow up 


1« Standardized 
Achievement- 
Test 


students 




. /• 




■ 

2. Questi^ionnaire 


^ staff 




/ 

■ / 




. i' i 

3. Questionnaire 

] 


parents 


f 




■ 


—U.T C la s s to 0 m 
Observ^ation 
Scale 1 


students 


_2 . '. 






* 

5.. Attendance 
Log 

(inseryice 
training) 


staff 

> 









i 

r-. . ■ 



ERIC 



li^earning Exercise 



ANSWERS 



(2) " , (3) ^ (4) (5) 



Type of 
Instrument 


Population 
on 

\*niom Data 
Are To Be 
Collected 


. Person 
Responsible 

for 
Collecting 
Data 


Group or 
Individual 
Activity 


1 

Type of , 
^ Follow up 


i. Stancrarai2:ea 
Achievement 

Test 


students 


classroom ^ 
teacher 


group 


♦ 

at later 
da'te 


2m Questionnaire 


-Staff 


program 
evalua tor 


group* 
individual 


contact 

absentee 
contact 

non^ . . 
respondent 


3. Questionnaire 


parents 


program 
evalua tor 
or 

designated 
staff member 
or 

volunteering 
. parent; teams 


a 

' J. nU-L vX UUcl.1. 

f 

t 


contact y . 
non^ 

respondent 


4. Classroom 
Observation' 
•Scale 


students 


\ n H o n'^ n H ^ n t" 

\ X I lUC^ Lie L 1 UCll L. 

< \>bserver . 
or.^ . 

clais^isroom 
teacl\<^r 

A 


group 


none 

■V. 


Attendance 

I L.og , 

(inservice 
training)-- - 


staff 
-f'i.,. - 


\ 

program \ 
eval.uator ^ 
.or 

dHSl'gna t e d 

staff member 


\ group— ■ 

■\ 

\ 
. \ 


■ — IVOTiCi ^ 



TV 

Preferred, if convenient., as at'a staff meeting 



. A. MONITORING AND RECOFpING DATA ON PROGRAM ACTIVITIES 

AND CONDITIONS. " . ■ 

"s. ■ 

One o£ the important tasks o£ th.e evaluation process .j.s the monitoring o£ 
program activities and conditions . that affect the implementation of those 
activities. If the program staff does not carry out the activities as 
planned or if unusual or unplanned events occur, the program results may 
he seriously affected and the evaluation report will pot accurately reflect 
the true situation. The monitoring of program activities is called formative 
process evaluatiotv . . The observation of conditions affecting the implementa- 
tion of program activities railed formative context evaluation. 

Formative Process Evaluation 

The major purpose for monitoring planned project activities is to identify 
deficiencies in implementation and to develop strateg^.ds for making improve- 
ments in the process being followed in time to correct the situation. 
Here are some examples of formative process problems: 

• Two teachers have. decided that the commercial math 
traterials being usgd in the program are not doing ■ 
a good job so they have begun substituting their 
own math. 

• Time spent on the teaching of reading varies from 
classroom to clas.sxoom. 

• Three instructional aides are teaching reading. 
Aides were, not assigned* th£V responsibility, 
nor have they had training for .this task. 

/■'■■' 

Formative Context Evaluation - - ' 

The maj^or purpose for observing and notating contextual problems that occur 
throughout the operationg.]. :ieriod of the program is to be able to plan and 
introduce alternative actions for alleviating the effects of the problem. 



Here are some examples of formative context., prbbleais: 

• A teacher quits in ^ ttie middle of the school year. 

• The reading resource room is vandalized, and all 
"the equipment a|id files haVe been destroyed. 

• An epidemic of chicken pox occurs, and over 

. half of the pupils are out of school for a ' ' 

period of four weeks. 

Procedures for CollectjnR Process and C ■ nuext Data ^ 

1. Set up record-rkeeping forms, such as motiitoring 
forms, arid management records. The forms should 

be comprehensive, simple, and serve as many purposes 
as possible. (Two sample monitoring forms follow.) 

2. Establish data-collection procedures. Decide who 
will be' asked about activities, and opera^tion^. ^ 
Decide when the ^activities should be monitored. 
A master schedule should be developed. 

. Develop procedures for acting upon problem 
situations that require change. 



V 



SWffLE MONITORING raRM A 



ocnpoi,. 



Date 



Evaluator 



ACTIVITY 



4- 



DOCUMENT 
AVAILABLE 



PERSON .RESPONSIBLE 
FOR IMPLEMEmiNG 
ACTIVITY 



HELP , 
REQUIRED 



FROM 
IfflOM 



COMMENTS 



I 

H 



2er|c 



23i 



School _ 

Date 

Evaluator 



PRaCRAM ACTIVITIES 



SAMPLE MONITORING FORM B 



IS ACTIVITY TAKING 
'PLACE AS PLANl^iED? 



EVIDENCE AVAILABLE 



Not 

Yes No Known 



Observation Records Conference Other None 



(List Activities To^ Be Monitor 



2. 



;^ DDD D n D DD 
QDO D 'Q. D DD 



3. 



4. 



■-DQD D D D -D D 



DDD 0 D D' DD' 



5. 



DDD D D D DD 



PROGR.\M EVALUATOR'S GUIDE 



Section F 



ANALYZE EVALUATION DATA 



^PThe Evaluation lmprovemc|nt Program 



ERIC 



2' > ' 



PRECIS 



A good program evaluation design, 'pertinent measurement instruments to use 
in executing the , design,' and careful administration of those instruments to 
collect the necessary inform tion prepare the way. for data analysis. There 
is a large array of statistics. that can be used — some that are descriptive, - 
others that are-'i-nfe'rent ial. 

Descriptive statis..ics include measures of central tendency (mean, 
median, mode), ^Tieasur^s of variability (range, standard deviation, variance), 
and distributions that are'other than normal (skewed, bimodal, rectangular). 
The inferential statistics presented hsre include t-tests» analysis of 
•variance, and multiple-regression analysis (which are useful in analyses of 
test score data), several statistical tests for treatment of ordered and 
ranked data, and the chi-square test for use with category data in testing 
frequencies experienced against those expected and to test cross-category 
associations or the relationship of two variables. 

The presentations in this section are on an introductory level. They 
are not meant to make statisticians of readers of this „Guide . They will, 
however, show what sorts of da ta analyses are 'required in thpse program 
evaluations that deal with "hard data" and they suggest implicitly what 
kinds of people might be sought to bring a program evaluation satisfactorily 
past the data-analysis stage. 



. , • CONTENTS 

, Pa&e 

INTRODUCTION F-l 

A Working Definition . . 

DESCRIPTIVE STATISTICS • F-2 

Measures of Central Tendency 

Measures of Variability F-8 

Distributions Other Than Normal ...... ........ • F-16 

INFERENTIAL STATISTICS F-^18 

Score Data • • • F-18 

Ordered Data • • F-19 

Category Tata . '. • • / V* ^^^^ 

Statistical Tests for Score Data F-19 

Statistical Tests for Ordered Data ."• . .. . F-22 

Statistical Tests for Category Data -,.F-25 

DATA INTERPRETATION GUIDELINES F-30 

LEARNING EXERCISE 15: COMPUTATION OF X^- • F-32 



1. INTRODUCTION ' 

This section serves as a brief introduction to.statistics and data analysis, 
Because some program evaluators may have receil^ed^'little formal training 
in this area, we have prepared the exercises, concepts, and examples with 
this in niind. We do not expect that- you will "leaf n all about . stati sties 
from this, brief treatment. If you complete the section feeling more ' 
comfortable with. the statistical notions expressed here, feeling that you 
have an intuitive grasp of the concepts and that, you have a better idea of 
, what data- analysis can buy. in the way of useful information for program 
'evaluation, then we will have achieved our purpose. 

A Vf orking Definition 

— — ' 

First, it is necessary to have in mind the difference between descriptive 
and inferential statistics. 



• Descriptive 

• Inferential 



^ 1 



Types of Statistics 

Numbers that describe a set of data 
Numbers which enable one to test 
hypotheses and make inferences 
about the"" effectiveness of a program 



The roUowing demonstrations will illustrate a- number of different descriptive 
statistics: 



2;j7 



2. DESCRIPTIVE STATISTICS' 



Coin-riipping Demonstration- 

Directions: ' 

• ■ ■ > 

1. Firsts take a penny, or other coin, from your pocket. 

2. Flip it 10 times and tally the number 'of heads anxi tails in the 



space below: 



Number ot Heads and Tails in 10 Flips 
^^^?P^^ Participant's Individual Tally 



1 

Heads 


Tails 


1 1 1 1 





Heads 


. Tails 







3.. , Combine your results with others at your table in .the space below,: 

^^^""Ple Participant's Group Record 



Parriclpant 
1 No. 


Heads 


Tails 




Participant 
No. 


Heads 


Tails 


1. 


4 


6 




1 






2 


7 


3 . 




2 






3 


5. 


S 




3 






A 


. 2 


8 




A 






5 ' 


6 


A 




5 






6 














7 








7 






8 ■ 








■, 8' 






1 Total 




_ 


1 


Total 




1 



At this point, the workshop trainer will t;et total's from each table 
and record them on a master record for the total group.. . . 



.Theoretically, if a coin is flipped a- very large number of times, the 
number of heads and tails c'ould be expected to be approximately equal. By 
this time, .you have probably noticed that there is random variation from ■= 
this expected 50/50 .spilt among the individuals sitting at your table and 
among:the different tables in the room. You have probably also noticed thaf 
as the number of coin flips increases, the deviation from- a 50/50 split 
becomes less (i.e.. your overall table totals come closer to 50/50 than' 
individual totals in your table, group » when all tables are combined, the 
total comes closer to 50/50 than, when several. are combined) . _ , 

This concept is basic to "what happens with test scores. A single test 
score of a single student is always an "estimate" of, his true s.core. ..There 
are many reasons why we cannot expect to get an exact score, some related to / 
the inherent difficulties in making these kinds of .measurements aAd some^ 
related to the specific conditions under which data are- collected. However, 
when groups of^tudent.s are combined and as the groups bec9me. larger, we can 
have increased confidence that the overall group picture is a better represen- 
tation of that group's status. 

This is also related to the concept of randomness. Those persons who 
■ got 1/9, 2/«, or 3/7 splits on flipping their coins deviated randomly from 
the expected 5/5 split. However ,. there was a tendency for those to be, 
balancea out by persons who got 9/1, b/2, or 7/3 splitr./ This is how random 
selection of students. and random assignment to groups can assure that certain, 
factors not controlled by the design are controlled through random selection 
or random assignment. . • 

At this point, we hnve certain b^s^.c raw data but they are not yet 
organized in a very s'ystematic fashion. Now we will organize the data into 
a freq'uincy distribution, display them graphically, and get some descriptive' 
statistics. Using just the infonratiou on number oP heads will demonstrate 
the point. 



ERIC 



2 'J J 



For your future reference, copy in the frequency column the numbers 
compiled by the trainer for the total group. " 













Value 




Value 


(No. Heads) 


r 1. Ci^ U c^lLv, y 


X 




Frequency 


• 10 






9 


s 




8 






7 




„ 


6 






5 






4 






.3 






2 












0 






Total 






Mean (x) • 






Median (Md) 






Mode (Mo) 




Range 

i, — ■ 





] 



24. 



6.- Using the frequencies obtained from the total worlcshop group, draw 
a. line graph below following the instructions given. 



Instructions: 

a. What was the highest number of heads anyane had? How many 
persons had fhat number? . . 

b. Place an x in the graph opposite these two numbers on the 
respective scales. . 

c. Do this for each 'value and frequency. 

d. Connect the x's with a lin6. 



oc 

^ CO 

C ^ 

0 o 

c <U 

° -9 

w e 

U ' 3 

C -I 

01 O 
^ OJ 

S a 
2: 



12 - 
11 - 
10 " 
9 - 
, 8 - 
7 - 
6 - 
5 - 
4 - 
3 - 
2 - 
1 - 



3 4 5 6 7 
Number of Heads 



9 10 



^ 21c 



Measures of Central Tendency 




There" are a number of conventional descriptive statistics which, may be 
used to. describe the distribution we just graphed. One is a measure of 
central tendency. If we select one value which typifies the group, we 
would select one towards the middle of ^ the range. Three such statistics 
are commo.nly used — the mean, median, and mode. The mean is simply the 
arithmetic average — all the scores divided by the number. of persons in the 
group* . In the coin-flipping exercise just completed, to get the mean we 
could have just listed each . person' s "score," one after the other, added 
them, and divided by total number in the group. However, this becomes 
unwieldy, especially as size of "group increases* Instead, we' have prepared 
a frequency distribution. When data ard in this form, the score values must 
be taken into d-ccount. To do this, multiply score value by frequency of 
that score, take the sum and divide by number of persons* 



The median is the middle value in a, set of scores arranged in ascending 

orMescending order. Count up from the bottom or down from .the tt)p , If 

/is an even number, of scores, the median is. ."the average- of the two 
middle ^scores, : . / » 

\ . . ■ . . 

The Vode is the most frequently occurring value* Sometimes distribution 

of test scores will, be bimodal — this is of ttsn seen in hvjteiof/ ^eous classes 

where there ±s a proportionately high number of bilingual ^;tu lents whose 

command of. English is not as good as that of native-speaking English 

students, \ • 



.d 



r 



0=1. 



Bimoda: 



Just, a few extremely high or extremely Toi^ f ,/^res can sometimes affect the 
mean substantially. For this reason, tl*e median io co;netimes preferred. 
Suppose, for example, in an acbieve* cnc t.est, a class of 25 studer:Cs (Case 1) 
obtained the following scores (out 100): 



RESULTS OF ACHIEVEMENT TEST 
N=25 • - .' 

Case 1 . . Case 2 



Student No. 


Score 




Student No. 


Score 


1 


O Q 






1 . 


JO 




2 


n Q 






1 2 


QH 




3 


V 97 






3 ■ 


y / 




4 


Q 1 






■ V ■ 4 






5 


r\ r\ 

oO 


> 




\ 5 ■ 


oU 




6 


79 






_ j 6, 


/y 




7 


7o 








/ O 




' 8 


. 77 






,, 8 


7 7 




9 ' 


75 






9 


•7 C 

10 




10 


75 






■ 10 


7 K 




11 


73 






11 


/ J 




12 


72 


Rcinge = 


= 98-60 


= 38 12 


7 7 

/ i. 


?'an^e ^o""^u — 


13 


71 


Mpfl "inn 


= 71 


13 


71 


Median =71 


' 14 


71 


> • 

Mean = 


73.6 


14 


71 


Mean =68.6 


15 


70 






15 


70 




16 


68 






16- 


68 




.17 


68 


^ Mode') 




17 


68 


^Mode 


; 18 


■ ■ 68 






'18 


68 




\ 19 


67 






19 . 


67 




20 


. . 65 






20 


65 




21 


55 






• 21 


65 




22 


63 






'22 


63 




23 


61 






23 


21 




24 


60 






24 


20 




25 


60 






25 


20 





The median (in this exaippVi;, tha I3th score) is 71. The mean is 73.6. V:\\t 
now suppose that Lhe lowest three scoresj- are, 21, -20, and 20 instead of 61, - 
60, and 60 (Case 2). The median remains the same, but the mean is now 68.8, 
a drop'of A. 8 points. 'Sixty 'percent of the pupils scored 70 or better, but 
the mean does not, reflect that. It has been affected by three uncharacter- 
istically low scores. In this Illustration, the median would be a better 
indicator of central Londeu«.y than the mean. 



f 



The three^measures of central tendency will vary in their 

■*. ■ ' 

relationship to each other depending on the shape of the distribution.. 
The two illustrations below denionsLraLe this:" 




X '= Mean 



t^o l^id 5 . * Msi = Median 

r 

Mo = Mode 




X Md Mo 



Measure:^ of Variability - 

Giving the central tendency is necessary but not sufficient to adequately 
'describe a set of data. The amount of variation in scores is also important 
to consider. Three such statistics will be dii^cussed — the range, standard 
deviation, and the variance. . , 

The range . In the example we just used to demonstrate the ^ef f o£ 

> 

extreme scores on the mean, the range , of scores for the first case was 

96-60 or 38 and for the second case it was 98-20 or 78, 
» 

Here arp some other examples of measures of variability: 




35- x=50 ei) • 
SCORE 



The stand .- rd d evi. rion. Anothper statistic that will holp us- as we look 

at . distribution- ol i:icores is the s tandard* deviat ion . Thi ?asure tells us 

more about how tWe scores spread themselves around the mear ivorage- score. 

The closer the scorer, cluster:, around the mean, thu smal \ : ..lu^ standard 

deviatioiu ^ . « 

■ \ 

Continuing with the .«=:ame three cases, the_- tigulres .,,0 F-11 show 

how the ^ize of the standard dQviation from the mean reflects the spread or 

v&rir.bility of scores. The more spread out or variable che scores, the 

^« larger the ^andard deviation. 

'* * 
The familiar bell-shaped curve, or normal istribution, shown ;.»n 

page F-12, forms the basis for making statistical interpretations, and there 

are known relationships between standard dsviaiion units and the percent of 

cases falling within those units. 

. Irt the theoretical curve, 68 percent of all scores^ lie between (+) and 
.(-) one standard deviation; .95 percent of the scores lie between (+) and (-). 
two standard deviations, and almost all lie within (+) and (-) three standard 
deviations. This relationship enables us to determine^ the likelihopd that 
differences between two or more groups ^or . two or more sets of scores obtained 
at "^different times are significantly different from each other. 

The normal distribution does not exist in nature. It is an idealized 
mathematical distribution which, approximates many "real" distributions that 
are found in nature. Its usefulness lies In the fact that known per^cents 
lie within given standard deviation units. 

In Ca^e 1, with a mean of .50 and a standard deviation of 15, 68 percent 
of the scores would fall betv:een 35 (mean - 1 S.D.-) and 65 (mean + 1 S.D.). 
Further, 95 percent of the scores would fall between 20 and 80, while ■ 
99 percent of the scores would fall between 5 and 95. 

. If two individuals in Case 1. made raw scores of 50 and 65, their . 
percentile ranks (P.R.) would be 50 and 65. These same raw scores in 
Case 3 would yield P.R.s of 50 Hi.J. i9 . ' 



24o 




Stondord 
Oeviotions -4a 



Percentile 



EquivalenH 



Typkol Standard Scores 
1 



Wechtlef Tmilf^ 
Subtetit L 

O«viohon >Q»L 



Th^ Psychological Corporation, Test Service Bulletin No. 48 , January 1955. 
Printed by permission of the publisher. Psychological Corporation. 



2 4 -6- 



Note also the relationship among various kinds of scores discussed 
U\ the previous section on instruments. In the figure on F-12, the unequal 
units on the percentile scale can bo asily seen. This has important 
implications for the kinds of statistical tests that can be usei: Four 
different kinds of standard scores are shown — each with a different mean 
apd standard deviation. Stanines are also shown in their relationship to 
other kinds of scores. 

Now, looking at the curves in the. three different cases, notice 
they all have the same mean., median, and mode. But the spread of scopes 
differs ma rt.edly. across the different cases\ So far, we've just calked in 
the abstract about the spread of the scores. Now let's consider an example-. 

, • As an ev^luator, you are collecting a variety^ of 

measures by which .you intend to see to what extent 



your program has met its objectives. For example, - 
one anticipated outcome is an increase in achieve- 
ment. To measure this, suppose you administered a 
standardized achievement t.est 'at the beginning and 
end of the year. 

The mean on the .pretest was 67; the posttest 
mean was 70, a mean increase of 3 points. That is 
a change, but> c,:in you now say, '''"The program was 
a- success. Our kids gained three points on an 



achievement test administered 



on a pre-p'ost basis.*'? 
" you were mistaken. Think again, 



If you answered that question "Yes, 
Remember when we -changed the last three scores in our table,' ''Results 
of Achievement Test," the mean dropped from 73.6 to 68.8, a drop of A. 8 
points? Yet only three scores changed! Now think of the first 25 scores 
as a pretest and . the second set as a posttest. You would think twice about 
reporting a loss of 4.8 points without some careful examination of the 
data. So be ■ af. skeptical about that increase of 3 points in the above 
example as you are, rightfully, about that loss sliown in Case 2 of the 
table. 

The variance . The variance is closely related to the standard deviation 
and raathematically it Is simply the square of the standard deviation. 



S^tat istical tests that are made to see if there has been "real** gain 
between pre- and posttests or those made tQ^aee if "real" differences exist 
between two or more groups are called tests of sig;.if icance . They examine 
the differencq between, means in relation to the /ariance of the groups and 
then use the normal curve or other theoretical curves to interpret the 
results • . ^ 

Consider what happens vjhen we want to compare two sets of scores, such 
as a pretest and a pos.ttest. Ideally we would like to see two nonoverlapping 
sets of scores: 




The lowest score on. the posttest is ^higher than the hig'.iest score on 
the pretest. Clearly, there has been a significant change. Unfortunately, 
real data do not usually behave this way. Usually' there will be less or 
more overlap" in scores, like this: 




or this: 




.10 • . 15 ..25 50 55 

As the overlap between pretest avA posttest /.cores increases, 'we become 
less "ure that a "real" change has occurred. Inferential statistics looks at 
the amount of change' in relation to the variance and ^gives the probability 
that a "real" change has occurred. ' 



The visual comparison below readily "^illustrates why variance is 



important . 




. difference of: twenty units between 
groups with relatively small standard 
.ations 




I 20 units I 

Mean difference of twenty units between 
two groups with relatively large standard 
deviations 



ERJC 



9 - ■ 



Distributions oth er Than Norinal , ' 

. _ 1. ' r 

While test. scores from norm-referenced tests, generally distribute themselves 
somewhat- like the theoretical normal curves we have been discussing, not all 
scores do. A basic underlying assumption for many tests ojE significance is 
that the data" are distributed normally. Be^o^re,. proceeding with any planned 
analysis, it is a geod idea to draw a ?,raph^and look at- the way the data 
do" distribute themselves. If in doubt, there are procedures for testing 
whether a given 'distribution departs too far from normalcy for you to use a 
Statistic based on a normal distribution. Here are some other than nornial 



distributions you [nay run into: 

t 



k- n 



Negatively Skev.'ed ^ Positively Skewed 

If a test is too eai:,y, your- scores may look like the figure on" the>left. 
There was not> enough "ceiling" on the test to adequately differentiate among"^ 
the best students. If^ a test is. too difficult, the scores may look like the 
figure. on the right with little differentiation among the poorer sti.rdents. 
Most norm-referenced tests do not m^i^sr.r-- well kt either extreme of' the range 
for which they are .intended. 



Ibdd: 




Bimodal Rectangular 

/V bimodal distribution mentioned earlier is one which has scores heavily 
con.centrated in two distinct parts of the scale. , This may/tend to happen in 
some bilingual programs if you have a mix of native English-speaking and 
native Spanish-speaking students neither of which have good command ,of the 
other's language. One way to treat this problem is to consider eacVi. group 
separately in the analysis?. • | * 



A rectangular distribution will be obtained if you .pl^t percentile 
scores for a -group that is similar to the norm group. This^ is a function 
of the percentile scale. The kinds of program evaluation q\uestioTis you 
want to an&^Wer and the kinds of .data you ^use to develop thc\answers will 
guide you in . the selection of which statistics to use. ^ 

In summary, you have been irit roduced . t o the following dc>scriptive 
statistics: ^ 





\ 

Descriptive Statistics \ • 


Frequency 


T^he number of times a given 'Vilue" occurs 


Mean 


..The average value; the sum of 3.\1 values 
divided by the total number of yalues 


Median 


The middle value in a distributipn 
arranged in\order from high t-j low, or 
from low to high \ '^'^ 


Mode 


The most frequently occurring value 


Range 


The difference between the highest:\ and 
'! lowest v<.<luc ■ 


Standard 
De\ iat ion 


A measure showing how scores spre'ad them- 
selves around the mean ' . \ 


Variance- 


T^ie. square of the standard deviation, the , 
basis for mSst inferential tests of ;s'ignif- 
icance * ■ \^ 







3. INFERENTIAL STATISTICS • 

Inferential statistics provide a way to test the significance of results 
obtained when data are collected-. As noted in the discussion on descrip- 
tive stcitistics* c>ll mea'surement is Ljubject ro error (due to inherent 
dif f icu Lt ies in ineasuring behavior and to j>peclfic testing conditions) and 
to rartdori- rluctfiation (due to t-he particular persons included in the sample). 
Inferential . catistics provide a way to separate chf>ace, errors and random 
fluctuation from, real changes.. 

.In selecting a particular statistical test, it is I ii« rtant to know what 
kinds of data you are dealing with and what the basic assumptions are about 
those data. Program evaluators will encounter • three basic kinds of data: 

• score^data » , 

• order data , 

. • category data 

Score Data . 

Each person has a numerical score that represents his or her peirformance or 
behavior. These" are the kinds of dat^ that come from any' standardized test** 
Techniques used .with score data make some rather stringent assuniiitions: 

1. Int&rvals between scores are equal; that is, dif f ereficeis ^ 
^between scores at one point on the 'scale are equal to the 

same size differences at any ptber point on the scale. 
(Note: Percentiles and grade equivalents do not qualify 
on this point . ) 

2. The scores are assumed to be normally distributed within 

the population from which they are. drawn. ^ . 

3'. The variances in two or more groups being compared are 
the saniW** 



^Percentile scores excepted 

**Actually, the assumption is that the variances in the populations are 
the same, where population is defined as the total group to which you 
can generalize. 



However, empirical studies have show.n. that some violation of these 
■assumptions by some tests does not impair their' usefulness. 

Ordered Data ' 

Each person is assigned a rank that . represents his position along a scale. 
If you have five different te>fts you are consider ir.g for 3dopti>on and ask a 
committee' to place them in order from the most preferred to the least pre- 
ferred, you have oruered data. Sometimes score data can be treated as 
ordered data, particularly if there . is reason to believe the assumptions 
underlying score data have been badly violated. Statistical tests for 

ordered data do not make stringent assumptions. . ^ 

•t - ' 

r 

Category Data . 

Each person is counted as belonging in a particular classification or 
category. Number of _parent:s for a band increase election and number of 
parents against the election constitute category data. Or a comparison 
which involves ^numbers in different ethnic groups gives category data. 

■■ > ■ ■ ' ' ' 

Statistical Tests for Score Data ' ^ • 

If the data you have L^an meet the assumptions underlying tests for score 
data, there are many different and potentially powerful tests that^can be 
uc-eH. Most inferential tests for score data req^uire special training for 
their proper selection and use. Unless the' program evaluator has had this 
. training, he or she is advised to seek the help of someone who has. 

Several commonly used tests will be^. ment ioned , but no effort will be 
made' to teyach the computational routines. Program evaluators who have 
access to a computer center may wish to seek assistance from that source' 
once Che decisions have been made as to What kinds of analyses are appro- 
priate. Do not expect computer people to help you decide what analysis 
. is_ Lo^.t apfiropr iate. They may be statisticians as well as data prbces-sors 
but most are not. 

<> 



Th-B t-tests m A t-test compares two' means (pretest ^ys .\^posttest or 
group 1 vs. group 2) to dete*rmine whether "reai" di'fferences^exist. There 
are several variations in computational routines depending updn the kinds 
of c^.ata being used. In order to select the appropriate computational 
routine for t'-tests, the evaluator must know: \ 

i* "■ Whether the groups beirig compared are Independen t cr\ 
correlated . If you have two measures on each person \ 
' (pre-- and posttest), the groups are correlated. If \ 

you are comparing two different groups, use the \ 
t-test for uncorrelated means. ^ \ 

■ ■ ■ \ ■ 

2. Sample size . N = 30 for each group being compared 

is the generally av. ::ep led lower limit for using the \ . 

,t t-test. For smaller groups, one of the tests for \ 

ordered data may be more appropriate. A description 
\ of this type of inferential test will be found below. 

3. Whether- the variances differ markedly . Unless the 
variances of the two groups are similar, use of the 
t-test is questionable. A separate test can be mads 
to determine whether- unequal variances is a problem. 

Note: A t-test used under pre- and posttest conditions must tal^e into • 
account what the expected gain would have been without the special program. 
Given ao* specia^ program, average students are expected to -^ga in one moijith 
for each month of instruction. To demonstrate superiority of a^ special 
program, it should produce gains beyond those expected in the absence of 
the program. Expectations for an educationally deprived group may be only 
one-half year for each sc: ool year. Past growth history of pupils involved 
can help determine what this expectation is. 

Analysis of variance . In its simplest form, analysis of variance is 
used when yoii wish to find out if differences exist in more than two groups. 
This is a practical method for program evaluators to use. 

Analysis of variance can also be used when you wish to examine various 
factors that may be affecting a program (instructional method, amount of 
time devoted to instruction, use of teacher aides). This is called factorial 



design and is" potentially a very powerful tool. Unfortunately, its useful- 



ness is somewhat limited by the 



need to^ assign either pupils or olasse^^ 



randomly to each possible combination of all variables i^eing investigated. 
For example: 



I 1 




Method 1 


Methocjt '2 

4 


30 ! 
mp.n. 


J 90 
min - 


30 
min . 


45 
min , 


Aides 
Present. 












Aides 
Absent 


1 

1 ' 




i- ■. ■ 





In this very simple design,j you would need to set up 8 different 
situations '(method 1, 30 minutes', of instruction, aides present; method 1, 
30- minutes of ins truction,- aidesj absent ; etc.) and then randomly place 
studentsv into each of the eight jsituations. Or you could assign intact 

--elessFooms (those not .d.iv-ided...iato.. sub groups) to each of the eight CQ.nd^^^^^ 

But just oofi^classroora per situation is not sufficient to take into account < 
the'teacher variabld. For this reason, powerful "as they are, factorial 
•designs may not be vvery practical for program evaluation. 

Multiple-reRression analysii . Multiple-regression analysis deals with 
prediction. In the case of program, evaluation, -it might be nice to know which 
pupils would benefit most from certain instructional units or which combina- 
tion of program characteristics produces the greatest student achievement. 

To set up a multiple-regression analysis, the program evaluator must do ^ 
the following: 

Identify a suitable criterion tha is acceptable evidence of achieve- 
ment. (End-of-the-year achievement test may serve ve^y well.) ^ 

Identify a set of predictors — those things that either preexist 
or measurements that will be taken during the year that you think 
will affect student outcome. . Preexisting factors may be such 
things as age, sex, general ability, socioeconomic status, gra*i^ s 
in related courses, etc. Predictive measurements taken during, .e 
vear may be Lest scores on unit5j of instruction, teacher judgment 
about pupil progress, pupil self-evaluation, and the. like. ; i 



1. 



2. 



SLai:lsLical tests for Ordered Data - . 

.Ordered "data may be ob,tained in two basic v;ays» First: No numerical scores 
are obtained, but you arc able to place . persons or^ob.jecl^i along some dimen- 
sion o£ interest (as when, a coiami dtee reviews five textbooks up for adoption 
and can make ii series of' decisions 3S to which is '-:iosL preferred, which is- 
least preferred, and which fit.;, in between). Second (and mos t . common ) : You ■ 
have obtaified numerical scores bu t-.'f eel>; the scores are not precise enough 
to r.eet the assumptions to use tests for score data. If you must convert 
score data to ordered data, bcl-' aware there are standard conventions for 
dealijig with this-: 



vScores 


Jla.nks 


13 


1 


'u ■ * 




ri 


— 3.5 


11 




9 ■ 




Scores 


Ranks 


13 


1 


12 




11 




11 ./ ■ 




11 . 




9 


6 



If two scores are' e q un T , t li e a v e r a g e 
rank (3+4) f 2 i$ assigned* 

■ it ■ : , . . 

Rank 5 (not 4). is assigned to next score. 



If 3 scores' ar e ■ e qua X i*. .the ay.ie.r age ' 
crank, :{3 H- 4 + 5.J is .'as.s.ig^ 

Rcink:*6^ (not:v5)\ii3- assigiYe4;;:ttf score. :• 



The Sign Test . The ,Sig,n Test can'b^ used ;to de te rminti ' whetrher changes, 
have occurred between two different po.ints in time .^ For >'example.,' suppose an 
evaluator vJants to dete'rmi ne the ef f ect iveness ^Qf a new unit on ci tizensHip ' 
des,igne(| to encourage pupils to' take a more active interest in 'a coming 
community election.^ The evaluator rates e'acli ^stud^nt on- a- scale of 1 to 10 
before instruction begins by getting inf orijia tro.n on such things as his or 
her^ knowledge of who is running for office, what th e- iksues are,'J;ow much 
tirce is' spent watching local TV new^qas ts- or r*ead;i;ug' iibdn.t the election in 
locaJ newspapers. After the nni t , ' t he ,,mqastar,es are 'reipcated, and new values 
on a scale of 1 to 10 are assigned. /.V. '* .: 



EKLC 













Data are recorded 


as follows: 








Pre™" cind 


Posttest Scores on Co:nnunity Election Unit 






Pupil 


No. 


Pretest - Posttest. • 


Change 




1 




9 A 


+ 

, b 




i» 




A 5 






3 




V " 3 






4 










5 










.6 










7 




1 * 5 


+ 








4 ■ - 5 ■ ■ 


+ 




9 




. . 3 • _ ■ . ^.7 






10 




* 2 . , . ^ 1 


T 





Note: Pupils who do- not change are eliininated:>f rom ttie table. 

The test consists of coxmting the, number pf changes," notin'g the 
total number of ^udents who change irt either direction, and consulting a 
ta-^le designed, for thi3 test.** In thi^ tase, the chao^e is. not significant. 

The Lruskal-Wallis Test . This test can be' used' to determine whest her ^ 
tbere are differences among groups* ! .; ^ ' V j 

Suppose the evaluator wants* to examine ^th^, self ':;c of students In 

three different groups (those who have hai tvo\years V of c^^ educa- - 

tion> those wHo have had one year, and ' those . ^ho ha^^ ''^^ 
compensatory educricLon)- The evalviator'' gives a^self-dpn And 
converts th«^ scores -to ranks. are-recorded as fo-l;lows': , . 



-t- jand - are cor.iiide.r.ed a 7orm of ordered d^ta./ \. ^. 

F0r complere descripcion, Scre l.Lnton, M. & Gallo, P. S. Jr., The practical, 

statistician' 'simp lified h^-ndbook. of statistics . Monterey, CA.; Brooks 
Cole, 1975. " j; • ■ . .. * 



■ 1 







to Compensatory Education 










2 


ic 

Year$ . 


1 


I ca r 


None 






Score 


Rank 


> 

Score 


Rank. 


Score 




Rank 




la 


3.5 


12 


1 


18 




3.5 




28 


8 


16 


2 


21 




5- - 




32 

46 


13.5 


37 

40 


11 

12 ^ 


26 
26 




6.5 
6.5 




52 


16,5 


46 


13.5 


33 




10 




62 


20 


52 


16.5 


R 1 




15 




63 


21.5 


. 61 
63 


19 

21.5 


53 
68 
70 




18 
23 
24 






= 92.0 




T = 9fi 5 


T 

^3 


= 111.5 












"3 


= 9 






For the 


coHTputat ion-minded , 


your 


WO rkshop trainer can 


;ive 


you 


the 




procedure to follow or consult 


Linton 


and Gallo cited on 


pa^-e C 


-23. 







* Two other relatively simple tests that can be used with ordered data are 
as follows : ' 

The. Rank Sums Test . This is similar to the Kruskal-Wallis Test, but. can 
be used when there are only two groups to be compared. [• " 

The Friedman Test .' The Friedman Test is appropriate when more than two 
measurements are made on the same persona at different times. 

Tests for Ordered Data 



Test 
Sign Test . ^ 

Rank Sums Test 
Kruskal-Wallis T-est 

Friedman Test 



Use " ■ a 

Tests pre- and postmeasurements 
on a single group. 

Tests differences between two groups. 

Tests differences among three or 
more groups. 

Tests differences when three or more 
common niGasurernents are made on t\\'i 
same persons, over time. 



Statistical Tests for Category Data 

The most comn. nly u4ed test for category data is the chi-S4uare (X } ,test. 

However, it may be used in a number oC different ways for aif^-rent purposes. 

The two most 'common uses of this statistic £or the program evaluo^or will be 
1) to test the deviation of obtained frequencies against some a prior-! set of 

expected frequencies, and 2). as a. test ol association. / 

Deviation from expected frequencies . To tuLurn to our seventh- / 
grade exper iiTientaL reading program, supposejone of the objectiv.es dehls 
with the attitudes of students in the ,rogram. An attitudinal questionnaire 
is given at the end of the year to see how! the group felt about the program. 
One ot- tlie questions the evaluator asks iy. 



All things 



considerrd, did you enjoy 



the experin.ental reading program? 
The students respond "Yes" or /"No." 
If the students really have no predisposition toward the program one way 
6r the other, we would expect that aboit half of them would reply "Yes" and 
about half of themwiuld reply "No." |f the overall response, is generally 
.positive, we would ekpect more r.han ha|l£ to reply positively. 

Suppose that out of 100 students skmplcd, 65 students said "Yes" 
(they enjoyed the experimental read ingl p rogram) and 35 said„3to." Is 65 
enough greater than 50 to conclude thai the ovc-rall response is generally 
positive and tliat Lt'did not just occur^by chance? 

The statistical question is: Is a 6^5/35 split signiL i'cantly diffe-ent 
frota^the 50/50 expected by chance if Astudents really have no predis- 
po-iltiun one way or the other? Th .. Le\t may be used to answer this 
question. The first tplcp is to construct t^he table: 

j . I.;umb.iT of . Students Who Resi?onded "Yes" and "No" 



i 


i Yes\ 


1 

) No 

! 


Total 


Ob:;f|rved Frequency 


\ 

1 " 




. 100 


1 \ 






100 1 


I ., . 


1 







For thescN^ata^ = 8.41. (Those interested in learning how to coinpirte 
chi-square [X"] \should see the Learning Exercise that begins on. page F-32 . ) 

This result must .aw be referred to a table to determine whether it inay ; 
be considered significant. To use the table, it is necessary to know the 
decrees of freedom for this problem and to select a level of significance . 
The concept of degrees of freedom is related to the number of categories 
being treated. For this kind of problem, the number of degrees' of freedom, 
is one less than the number of categories. Sincie there are two categories, 
there is one degree of freedom. 

The selection of level of s^ignif icance is somewhat arbitrary and indicates 
the amount of ri$k the evaluator is williiig to take. The greater the magnitude 
of an observed difference in relation to the variability of the score involved, 
the more likely it is that a real and significant difference does exist. It is 
statistically possible to state the chances that an observed difference is ^a 
real one or one clue to chance. In our example, we will select the .05 level of 
signif iCfince. This means the evaluator is willing to run the risk of being 
wrong five times in 100 if he assumes that all differences larger than the one 
read from the table are considered to be real and significant. It. is also 
important to note that s^ample size becomes important when attempting to 
establish whether or not rJiere is statistical significance. The larger the 
number of observations, the greater the opportunity is for the effects cf 
chance to be reduced. ^ 

2 

tJow look at a portion of the table on significance levels for X . 
Portion of Table Showing Significance" Levels for x^ 



Degree of Significance Levels 

Freedom .25 .10 .05 .025 .01 .005 



1 ' 1. 3 2. 7 3. 8 5.0 6.6 7.9 

2 2.8 4.6, 6.0 7.4 9.2 . 10.6 

3 4. 1 6. 3 7. 8 9.4 11. 3 12.8 
5.4 7. 8 9. 5 . 11. 7 13. 3 18.5 



This table- gives values for significance levels from .25 to .005. Since 
we selected the .05 level and we have one degree of freedom, the value of 
interest to us is 3.8.' In ordor lor the difference found in our problem to 
be considered significant, our .<} value has to be greater than 3.8. Since 
our value is.8.Ai: we can conclude that students in this group rniilly do 
have a generally favorable attitude toward the program. In fact, our value- 
is greater than 7 . 9 , the value- given at the .005 level of confidence. A ,. 
Value as lar^-e as 8.41 occurs less' than once in 200 times. 

I 

■;<^_as a L est of association . The most frequent u^e of X is as a test 
of association. This test will tell you whether or not there is a relation- 
ship between fwo variables. For example, you may survey your community to 
get information about whether" they would support a tax increase to pijvide 
additional :school services.- Because having or not having children in 
school may influence the vote, you want to analyze your data to see if there 
is'a relationship between the responses you got and having children in 
school. 



Responses on Tax Increase Issue 





Ikive Children 
in School 


Do Not 
Have Children 
^ in School 


Total 


Approvcf 
Increase 
in Taxes 


60 


20 


80 


Do Not Ai^prove 
■ Incr.ease in 
Taxes 


30 


40 


70 


Total 


90 


60 


150 



For these data, X^" = U.76, again a highly significant value (see 
computation on page F-32). We can conclude that there, is a definite relati 
ship between having children in school and willingness to support a tax 
increase. Course of nction: Get the parents out to vote!" 



-i-s a test of association can --acconmodate more than two levels for each 
variauLc., provided certain conditions are met. In the above example, there 
could have been tliree catcgorLo.s of response — ^^"for," ".against," and "undecided 
Or you may have wanted to do the analysis by age group of respondent (2i-35, • 
36-50, over 50), or by some economic index (high, medium, and'' low), or by 
ethnic group (white, black, Chicano, other). - The degrees of .free'dora change 
as the diuiunsions of the table cliange and^ire equal to 

df = (r-l)(.c-l), 

where, r equals number of rows and c equals number of columns. For a 3 x 4 

table, df = (3-1) (4-1) 0, So long as the rules shown bielow are observed, 
\if ,iri be a very flexible tool. % 

Rules to Follow When UsinR X 

I. The raw data must always be frequencies. Counting people 

who pass or fail a test is' legitimate. Counting tlie number of> 
items that each person passes and getting an av,erage score 
is not legit i mate ('this is score data) . If your ' data are 
presenti'H percentages, convert back to frequencies.. 

. 2- AI ! .iialyses require that each subject 'or event be 

counted only oncu. In some cases, you may liave more than 
one measure of a given type on each person. Special tech- 
niques must be used when this occurs. ^ 

3, Lf samples aru very small, or if some expected events are* 
L'Xt.remuly infrequent, /' r;ay not be app rop r iate . Tiiere 
must be expected frequencies for 2 x 2 tables of at least 

5 tallies in each cell. For larger tables (2 x 3 or " 
greater), all e>>i^pec ted f requeue ios mus t be 2 or more . 
Special tests can be applied to make ad j us t:;;ent s if this 
criterion is not rne t . 

4, V/livu something is coinUod because it is present, absence 
uiust also be con 11 led. For example, if you v/'./h to see if 
.\;e y. i s re I. a led to pass i wy., or fa i 1 ing some ob j ec t ive , you 

. . :misc r''.'(';' X'\ faaiaj-cs as v;e 1 I -ii; [).js'-;«/s in the t.w-j ^^roiips^ 



$mhm uF inh:kkntial statistics 



.Kind of 
•Data 



Sl.^I.^^3tical 



L-Test ■ 



Score 
Dat.i 



hirposc 



lu ck'tLTmLriL' whetlu-r a significant 



To diiUTininc wlieLlici .1 signlliciUit 
dllfurenco exists botwoLMi pret.i3.st 
,iiid posttest 



Lxaniph; ot Question Asked 



Did studc-nts In the demonstration .prcfirym 
purtorin better un a Cost of achievement 
:\{ the oiul of the year than pupils In 
the r«?>;ular prox,raiii'l 



Did a ■significant chanj^e rake place ov, r 
uorinally expected gain during; the courbe 
ol the yearV 



Orde red 
Data 



Analy.sis of 

VariL-ince 



'To determine, wliether si^/iif itiaiit 
diftercnccb exist amonR three or 
mare groiip.^ 



l.s btudeiit iiLuiL-vL-Micnt alfccted by cut- 
ting insirucllonal time {row 6U minutes 
to lumutes or AO minutes? 



, Keyresslon 
Analy.sis 



Sli;ti Teiit 



Rank Sums 
Test 



To detcrnine what fauturs account 
tor outcomes of a particular program 



To predict whati£actory account for 
student outcomes 



To dctermini; whether a slgnli, leant 
chr-n^e has taken plact: between iwc 
different, testinf.; tines 



Kruskal- 
I Wallis Test 



Friedman 
Tcs t 



C:ate,]ory 
Data 



Chl-^quare 
(.■) 



To .determine whether there in a 
sij;nili.<:ant difference between two 
groups 

Ti.i determine whether slj^nif leant 
ditference.s exist among three or , 
■innrL' }',ruups' 

To det:ermlne' slHiiincant differences 
when three or mo're .common neasure- 
neiits are made on the saiw per.snns 
over tiriii 

To test devi'at ion tof obtained 
irequencles against some a'priori 
set of expected f rcqnnncie.'i 

To derermlmi whethe-" tlier«! Is a 
sly^nUica^it reiatlonship between 
two variabilis 



Are ^ains in student achievem^'nt due 
primarily, to teaching methods, to time 
allotted for instruction, or to the 
presonrb ol aides in the classraom? 



Is It potjsible r.o predict which students 
win benefit mrst from a unit rn alcohol 
abuse? 



Did student? take a tore active Interest 
in the comuiunity election after a special 
unit, on citizenship? 

Are stud, lit s r.niked diflerentlv > " 
a^:f',ress ive behavior In srhdi-l 
compared to school H? 



Dn'tlie self ^irncept M pupils 
dey^rees ot" expi'isnre to rc:;..p. 
r ion dllter 1 rum one ' 

Do students' ;»er:ept lens 0' !. 1 
rhinge ovt-r tin; oourse of 



!)ld paruntu uike favorabli' 
Hl^^nlfi^Miil !y more Miiu's t( 
expected chance? 



Ing 



ild bu 



Are parents u.ih chll'dron In s-.'-oul 
more likely ii '.vor a tax Increase 
eU'Ctjiai than pu.snns nut liavii/. 
ch I Idrefi in school? 



4. ,DATA INTERPRETATION GUIDELINES- 

Oace the analyses have been performed and certain outcomes have attained 
statistical significance, and once tjie descriptive data have been sumrnarir.ed 
and presented in tabular dnd graphic form: 

What are you justified in saying about the results 
of the evaluation? WhaL cauLions must be observed? 
. l/hat kinds of remarks avoided? 

Al. a general rule, the evaluator jLs advised not to make broad, 'sweeping, 
j^lobal statements that the data "prove" the success of a program. Statistics 
•io not ^rove. any thing. Statistic provide the basis upo.n which people make 
•fif erences and interpretations. Be sure you distinl^uish between the facts 
given by statistics and the inferenc3s made by. people. 

Moreover, the evaluator must be careful to define the population • to 
which the results are generalizable, c iting sampling technique^ used to ' 
support claims of generalizability . For example, suppose a questionnaire 
intended to obtain a random sample of teacher opinions about an innovation \ 
drew a response from a disp ropo.tt ionate number of female , teachers.' The 
evaluator would haye to decide how much stock to place in the questio^nnaire 
responses and would haye a responsibility' to report his or her professional 
judgment on the possible effect of lacking ra <iomnciSS. 

Fur|:hermore, the evaluator needs to know and report the relative 
strengths and weaknesses of the various instruments used. It is advisable to 
acknowledge the difference between data collectiop instruments which require 
people to perform or demonstrate what- they know as opposed to just asking 
them to make judginents or offer opinions. Judgments, particularly wh^n made 
about other people, arfe prone to large fluctuations due to differences which 
exist among people because of their varying- standards and background 
influences. . 



F-31 



Thus, with a good design and app^ropriate analysis, the evaluator at a 
minimum shoAild be able to say: 

1. V/hich students, or student groups are realizing 
achievement and other benefits from the program 
and which are not; 

2. Which components of the program are paying off 
- "in student gains and ' improvements , and in what 

ways; 

3. What impacts other than changes in student 

g ' learning -^ave there been which have affected 
parents, students, teachers, administrators, 
vartd others* ^ 



2} \j ( ; 



F-32 



Learning Exercise 15 



LEARNING EXERCISE 15: COMPUTATION OF X 



2 



X 2 table: 






t ■ 




Have Children 
in School 


Do Not 
Have Child'ren 
in S.chool . 


f 
* 

' Total 


Approve 
Increase 
in Taxes 


60 

(f) ^ 


20 

'(b) • 


80 

(a + b) ' 

« 


Do not 'Approver^ 
Increase in • ^ 
Taxes 


30 
(c) 

J 


AO <^ 
(d) 


70 

(c + d) 


Total 


1 

: 90 

(a + c) 


60 
(b + d) 


150 

(a+b+c+d) = N 

7 



N( |bc -ad| 



r 



(a + b)(c + d)(a + c) (b + d) 

2 _ 150(1 600 - 2A00| - 75)^- 
~ (80) (.70) (90) (60) 



2 ^ 150(1725)2 ^ 
30,240,000 



2 _ AA6,343,750 
30,240,000 



14.76 



In tabic;!: Larger than 2 x*>i: 

The conputational scheme for tables larger than 2x2 requires that an ■/ " 
expected trequ^Micy be tleveloped for each cell iri tlie table. The expected ' 
cell frequency is obtained by multiplying the total of the rov<? to which the 
cell belongs by the total of the column to wliicli the cell belonj^s and then 
dividing by the 'grand total. 



l_ •' ' ■ ' . , ' . Learning Exercise 15 

■ ^ rJ ■ . F-33 



•In this example,. diica are arranged fo'r an analysis of the returns from a 
•questionna'ire which asked parents of three different ethnic groups how many 
pupils in their school needed a-bilingual program., 

' ' . ■ - Ethnic Group 'of Parents 

% » ^ . . i 



•students Needing ;• : _ ; ^^^^^al 
Pr:.''rams ' ^- - , -/ ■ 



All' ' T^, {47;1) ; - 54,(68.2) 12 (25.7).-' 141 

Most b4^'(66.1) '106. (95.8) • 28 (36.0) 198 

Few 28 .(53.8) . j62 (78.0) 51, (29.3) 161 



16 7 



242 91" 500 



-The expected frequencies are given in parentheses ( ) and the 47.1 given in 
' the first box is 

. . . •. • IMll-l^ = 47.1 ' ( 

500 

X-^ is calculated by subtracting the expected value from the obtained value, 
squaring and" dividing by the expected value. ^ ^Jhen this has been done fox . 
each cell, the'~ resui'ts are added. 



Z ^ „ (fo - fe)^ 
'■ fe • 



(75 - 


47. 




- 47. 


,L 




(64 - 




1.) 


60, 


...1 




(2H. - 


53. 











08. 



93. y 



78. 0 



+ 


(12 - 25. 






25.7 




+ ■ 


(28 - 36 






3b. (] 






(51 - 29 






29.3 





- 58.38 



O ' 2 / J 

ERIC 



F-34 



Learning Exercise 15 



, 2 ., 

X r.xercises 



.Directions; 



a. Using the 2 x .2 methou of computing x^' as a test of kssociation just 

illustrated, mnputo v-', for -these values, read the level of signiti- • 
cance from tho r table at F-26, and draw a conclusion about these data, 



1 ■! 

i 

■ 


Have* 
Children 


. Do Not 
Children. 


Total 


\ 

Approve 




30 


80 


Dp Not 
A^rovG 


^0 


30 


70 




90 . 


1 — ^ 

j 60 

\ ■ ■ 


150 


\ 




\ 





N(|bc - ad| - 



(a + b) (c + d)(a + c)(b + d) 



Assume vour data have a third category of response— "undecided 
C.o.pu-e >,- for this 2 x 3 table using the formula 

determine the ic-ve^ of si^ ni i i cance, and draw a conclusion about these 
data . 



1 

i 


! 

HavL' 1 
Ch L Idron - 


Do !;oL 
Cliildjren 


Total 


App r ove 




2i 


70 


UniJ tiC Ldecl 


\n • 


1 


20 


H'i Htipprovt. 




30 





i3r) 



ANSWERS 



X = 



rll200 - ]500! - 75)- 
30,240,000 



X' = 



150(225)- 
30,240,000 



7,593,75 0 ^ 25 
30,240,000 



Sig. > .25 (There are more than 25 chances in 100 that th^* observed 

differences are due to random fluctuations.) 

1 . . • ' • • 

Conclusion: There really aren't any differences of opinion between 
persons who have children in school and those who doa' t 



X*' = 



f 

• G 





Have 
Children 


Do Not 
i Have 
Children 


Total 


Approve 


50 
(42) 


(28) 


70 


Undecided 


10 
(12). 


10 

(8) -> 


20 


Disapprove 


30 
. (36) 


30 
(24) 


60 


Total 


« 90 


60 


150 



(50 - -'.2)^- (2 0 - 23) -- ■ _^ aO_^Jl)i\. (10 - 8 )^' + (3.0 - 36 )f _^ O0_ 
"42 " 2lr; " .12 8 I 



36 



52 + 2.29 + . T.? + .50 H- 1.00 + 1. . 50 

t 

. r ^ (if - (2 - I) (3 - 1) - J? 



Sig. at .i)b level (There are 3 chances in 100 that tlie observed 

dif i e.roiu:cs iin^ due to ratidoir. fluctuations.) 

Conclusion: ' • ru is a rclaLionship between having children in 
Hchool aiui ihe opini in adults lield. 



PROGRAM. EVA1.UAT0R'S GUIDE 

. Section;^ 
REPORT EVALUATION RESULTS 



The Evaluation Improvement Program 



. - ■ PRECIS 

Program ev^iluation reporting is largely a matter of good school-conununi ty- 
rel:;tions. The principles that apply to positive and open school public 
relations apply here as well- Some of these principles, particularly 
those that are significani: ..i the framework of interim and annual program 
evaluation and the impac^ Ol . • 'ih on longer-term improvement, are treated 
briefly .in this section. . * . 

Relevance, clarity, and specificity are the three critical character- 
istics of the program evaluation, report. It should address each of the 
program's objectives and report forthrightly on whether or not the data 
indicate that such objectives hav^a or have not been met. Wording should 
be clear and concise with modifications of style and approach wherever- 
appropriate to fit various audiences Statements ' should be specific 
enough so that readers wi 11c unde r^ land what aspects of a program can remain 
.unchanged, whdt needs changing, and what needs to be quietly laid to rest. 

The^^report should- bcH sent to those who will lend vigorous support to 
/i) the continuation of those- parts of the prograra that have been shown to 
be successful, and b) improvement of whatever aspects of the program have 
been shown to be negative or neutral. * 



1. 

2. 
3. 

A. 



■ 1 

CONTENTS \ 

: ; ;• . Pa^ 

IDENTIFYING AUDIENCES . . A / C-l 

WllEN TO REPORT . . ; \^ G-1 

THE INTERIM REPORT ; \ ^"^ 

PROGRi\^l MANAGOIENT REVIEW RECORD \ 

LCARilING EXERCISE 16: ' RECIPIENTS ^J^D lUSES OF INTERIM \ . 
/ EVALUATION DATA . ' \. . . . 0^4 

THE FINAL PROJECT REPORT \ . . . G-6 

A Suggested Outline ' . G-6 

REVIEW A.ND RELEASE OF THE FINAL REPORT . . . . • • • • • • >. • • • 

LEARNING. EXERCISE 17: DETERMINING APPROPRIATE DATA DISPLAYS .\ . . . 

■ . ■ ; > . • ■ 

LEARNING EXERCISE lb: WRITING RECOMMENDATIONS FOR THE FINAI, . . 

REPORT . •• • • . • G-IA 

^hARNING EXERCISE 19: .M^^ALYZING PRO.GRM EVALUATION \ 

RECOriMENDATIONS " " • • ^^^^ 




G 



1. IDENTIFYING AiJUli:NCHS 

To assure continued program support, it is wise to submit evaluation 
information to as r.any audiences as possible. These audiences nay include: 

• Instructional staff 

• Adininistrativc staff 

• Parents 

• Students 

• Citi:iens' Advisory Committees 

• Superintendent » 

• Board of Education . 

• Total Gommunity- '. . 
t» Funding agency 

A:L,cacies vary in the.nieeds they have f o r ^ eva luat ion data. The needs, 
in genera., should correspond to the purposes .of the evaluation process. 
The purposes of program evaluation may fall into any of a number of cate-^ 
guries, among* them tiie following: 

1. Ascertaining program quality for ^ill concerned • 

2. ' Providing information for decision makers 
J. Ir roving existing programs 

4. 'providing satisfaction to participants 
b. Comnainicating with the public 



2. Wlif.N TO RLl'UR: 

In planning the program evaluat ion, the evalu,.2r must determine types o£ 
evidence acceptable to each audience. It must also, be determined when each 
audience needs to receive the results of the program evaluation. Some 
audiences need evaluation reports while ^he pro;'ram is in prv-'^ress. Such 
reviews are called ' interim evaluation reports. Other audiences need 



0 



- 2 



G-2 



evaluation reports only at' *the end of the program in what'is ccmmpnly known 
as the final project report. A number of audiences will require both interim' 
and final evaluation reports. « ' 



3. THE INTERIM REPORT 

The purposes of the interim report are to monitor the program in progress, 
to derive information that may improve the program, and to get any early 
indications about the probable outcome. Interim evaluation reporting may 
be done formally or informally and . occasionally , orally. The report should 
be timely and provide the information needed by specific individuals and. 
groups when they need to act on it. The report should be brief and concise 
without being cuTSory, and it should make 'very clear how the information it 
contains is. to be used. Learning Exercise 16 focuses on the variety bf uses ' 
and audiences for interim reports . See page G-4 . 



k. PROGRAM MANAGEMENT REVIHlW RECOFJD 

The Program Management ^Review Record shown on page G- 3 can be used both to 
monitor an ongoing program and to prepare interim reports. 

Both program objectives and activities are listed on the form. 
Accompanying columns allow for recording information on interim prqgress, 
specifying the additional assistance that -may be needed to sharpen ttte 
objectives and facilitate the activities, noting whatever corrective action 
needs to be t:aken. 

This 13 a sarap.l.e of a management support tool that, is easy to use, that . 
assists the program evaluator . in jnonitoring the completion ■ of activities, and' 
that provides, information for interim reporting so that decision makers may 
more effectively direct the program. * Changes may b^ made to correct a 
possibly serious deficiency, or not so critical ibut neverthele'Ss' important 
omission,^ in stime to.jlnake an impact on Jthe final outcomes. 



ErIc ' ^ 2 Vr 



mem mkcmi^i review record 

0 , * , . 



• I 
1 

1 

^ ' r 1 

DWKCTIVl'S .-VND ACTIVITLHS \ 


COMPLETION 
DAIE 


0 


\ 

REASON FOR DEFICIENCY 
(if applicable) 


SUGGESTED ACTION' TO CORRECT DEFICIENCIES • ^ 


CONPL 
yes 


ETHD 
. No ' 


Person 
, Responsible 


Action 
H'o Be Taken 


■ CoT.pletion 
Date 


\ 

i.O ' OBJECTIVE: By Jifne ■ 
• pupils will Ijave 
mastered an average 
of 10 or more com- ■ 
prehension skills , 
as measurejl by 
actainnient of 80 
percent or. higher 
on the criterion- 
; referenced tests 
1 accoip.panying the 
1 ■ skills sequence. 


1 

limp lQ7i 
1 " 




f 

■ 










1 ACTIVITIES: 

i 

1 1,1 Addnister ciag- 
1 nestle test 


, Sept. 18 






e 








\/l D(?vclnp pupil and 
j J ax; profile 


Sept. 25 


— 






>> 


1 




1 ^ Place rnpils in t 

instructional , 
: swuence _ 


Oct. 2 

1 i — 


1 








1 




I [A Eiitabllsh learuliu; 
1 cento rs • 


( Oct, Ifi 

! 

I. 


. K 




! ' ' ■ 




1 


e 


1.3 Develop Independent 
! activities * 


:iov. 16 




X 


insufficient 
time. 


classrpo:;< 
teacher 


assiUance by 
r-ifioarr.! teacher 


Dec, I 
1 



-Learning Exercise 16 



/ 



LEARNING EXERCISE lb: REC.EPIENTS Ai>3D L'SES OF'X^^TERIM EVALUAT-ION DATA 



This .':exercise is designed .Lo provide .experience working, in small groups to 

V \ ■ ■ ■ 

detennine who needs interim evaluation data and how the information may be • 
used/ .Three statements of objectives are shown tcgetaer wiNth a listing, of 
interim information available on each. You -sure', asked to comp'lete the 
exercise by predicting which groups will need to have each clu^er of 
iTif brma tioti and wl at' uses they will likely mal^e of it, \ 



:. Complete the blanks iu column three of i le table on page G~ll- b3;^ 
indicating for each information cluster one or a combihatio^n of the 



f o 4 : . ow i ng : • * . ^ 

/ 

Students 

Teachers 
•PriacipaJ. 

Citizens't^Advisory Council - ' 

Aud others as you wish. ' 

N'extj ' -implete t'no. blanks in' column four. Sample statements are as 
follows: 

To dasiguate the t^kills to develop 
in tK e n-'ixt »lri-serv ice sessions 

To c L : mi ne whether the obj ec t ives , , 
arc being r??.t - ^ * 

* ' ro de te rmi he me thods of increasing 

par al i' volvement . * ' 



iNTERiM EVALUATION DATA 



School 
Date 



. . . ■ . ■ . -INTERIM. INFORMATION' . ' 
" • OBJECTIVE ' " AVAILABLE 


. COLUMN 3 


■ ^1 


PERSON (S) NEEDING v 
INFORMATION 


USE OF ThE . 
INFORMATION 


' By .June, 75 percent of the 
iSarticipating pupils will have 
mastered 10. or more criterion 
objectives relating to readiriR 
comprehension skills. (Check 

, off on profile when teacher 
detemdnes that the skill has 
been mastered.) 


kmber of skills mastered 
V" each pupil 


f 




1 1 — — ^ — ' 5 — 

Number of students mastering . 
skills A, B, C (etc.) ■ 


•i 

1 ^ • 

1 


■ c 


By- Jun^, af.least;AO parents o'f 
participating pupils will have 
provided volunteer help in the 
.classroom, as shown on records 
kept in the office. , 


Number of parents involved 
to date 






Rames of parents not yet 
involved ' 


7 ■ , . 


— ■ ' — 


Three^fourths of the staff- 
'b^evelopine^^ sessions held 
'during the year, will be rated 

as effective by at ledst 75 
, percent of the participants 

responding co.a locally 

developed rating form. 


Number of participants rating 
sessions as effective 






Rating forms with suggested' 
changes ^ 

1 


1 





er!c > 



P3- 

3 

m 

0 

. . 0 

1 H 

2: in ^ 



" • . 5. . THE FINAL k'ROJECT REPORT 

t ■ ■ 

The purposes of the Final Report are to summarize the results . of the 
evaluation: V/hat was the pfog'ram designed to a9Complish? What was done . 
to accomplish the objectives? What did the program accomplisH? How was 
the program evaluated? What re commend a ti^'ons are there for further action? 
Like the interim rep0;i:t, the final project report should be timely, provide 
the information needed by "Specific individuals when at is needed, be clear, 
brief, and concise. ^ . 



End-of-the-year program, evaluat ion reporting typically is more formal y 
in nature than interim reporting and generally is in written form. Upp must 
consider the variety of audiences to whom the final report is to be: directed 
and select tfie formats, presentations, and visual aids that will be appro- 
priate for each specific group. 

The evaluation will convey the same basic information to all audiences; 

however, the de'tails in ' thta several reports will vary according to the needs * 

and purposes of the several readerships. What.ever the expected readership,. 

brevity and clarity always are paramount considerations. 

* ■ 

A Suggested Outline . 
Below arj some suggested headings and guides for writing^each section. 

1 , Fr oi^ram Goals an d Ob jectives 

a.* 'Review and translate the goals and objectives of the program into 
the language of the reader. v 

"2* ^^rugram Descr iption ' : ' , . 

a. Describe the population participating ,in^ the program. Include th.3 
. . number of pupils, teaching tJtaff, grade level, subject matter; and 

.schools in the study. ; " . 

b. State tile length of the program with beginning and ending dates. 

c . Descr ibe the signif ii'.ant ac tivi ties, ma te rials , and personnel used 
■ in the program. ' . , . . • 

^ ' d. Note parts of the-program that are unique. 



Program Evaluation Procedures ^ 

a. -Describe the design, instruments,- and analyses which wetc used 

:n evaluating the extent to which the stated objectives were 
accomplished. 

b. TaiJor the language and terminology to the audience that ^is to 
receive the report, 

Program Accomplishments * 

"a. Describe the posit ive ' results . o£ successful activities, 

b. Descr-ibe the marginal results of unsuccessful activities. 

c\ Descrll^e unanticipated outcomes and 'side effects that have been 
observed. . * ^ r 

d. Emphasize changes observed such as score gains, changes in 
attitudes and behaviors. . ' . 

Program Evaluation Conclusions - . 

Present judgments as to why each objective vtas or was not' met. 

b. Present alternative proposals for different approaches in those 
Instances in which objectives were .pot realized. 
• Present alternative propvjoals for improvements in those instances in 
which realized objectives could be surpassedVin future programs. 

d. Drau sumnviry statements on program effectiveness through a balanced 
review of successful and not-so-successful outcomes. 

e. Whenever, possibile-, relate ^program effectiveness to program'costs . \ 

Othe r Findlny .::; ^ • 

fe- ■ < . ^ » . 

a. Report on the results of -surveys, questionnaires,' intery*iews, and 
other such" data that may not fall under the heading of Program 
Accomplishments, but ar-.^ relevant to pt'ogram outcomes. 

b. on informal findings and conclusions drawn from' information 
a , ;emb led outside the framework of the program, evalu.ltion-. ^ ■ '* 



G-8 

. / 



7 . . / Recommoadations Related to the Program and Program' Evaluation . 

a* Recommend a preferred alternative for each new approach .nnd improveme.:-.'; 
in the program wliich would l^^ad to greater achievement of objectives 

in the future. / 

V : ■ . j / 

, b. Suggest, revisions in objectives and in affected program features> 
especially regarding ti^ps>e objectives that were not m^t. 

c. Suggest revisions in program evaluation design, ins tr'uments,... analyses , 
and procedures that can be. applied to subsequent program evaluation 
ef forts. ■'■ • ■ I 



"REVIHr"Al^TD~Tli:i:EA"SE"OT THE FINAL^HEPDUT' 



; / 

The evaluator should arrange ^to have the informiition in the final report / / 
reviewed by selected members of the program staff and by a sample of those 
for whom it is intcndeci. This review should take place_wh ile -the repor t ^ll 
being written- The reviewers should be asked to verify the description of 

the program and that the types of information presented 'are those that aue 

■ ' / ' " 

needed, that the formats, exp ia'nations , and visual 'aids in the report ar6 

. \ ■ .;/';'. 

clear,, and that the recommendations ar^ appropriate and consistent with; / 

■ / • / 

existing policies, directives, ajid guidelines. ' . ; - / . 

•"■■•*■ \ ' - ^ ■ / ' 

The- final draft sh.oul.d be reviewed\by the project director, chief, / ' 

administrator, and sta.ff who were involved* in tlie collecting and summaVi^inr. 

of inf armation. This will provide a final cKeck on the report's accuracy; 

and appronriateness as well as assuranc^ that the report will Jiave ^.he 

support of ail the- program participants.^ . « / 

The publication and release of the final report of a program evaluation 
is usually the responsibility o£ the chief administrator. * / 



Er|c ' . - ' • . 2 J,; 



Learning Exercise 17 
" G-9 



LKARiaNG EXERC:SE 17: DETERMINING APPROPRIATE DATA DISPLAYS 



Exanine the documeaLs on tlu> next four pages. Each formal contains Che same 
. InEorniation on pupil reading acr J • vement , bu t the inCormat,ion is reported 
la three different displays. Yol asked to list. in the\ upper right of each 

display the audiences in voi\; d...t^ict who could make good ^se of, information 
of- this type. Judge the effectiveness of each display for trhe chosen groups, 
decide whether' or not it would be satisfactory as is-and what modifications 
would make the display clearer. , ■■ . ' _ . 

Audiences may include program staff, citizens' 'advisory councl-l, .super in- 
tendenL, board o£ edupUion, a spp.ci'al interest group,' and the\ total community. 
Add oiheLS if you wi ^^JiT \/ 



, G-10 



Audiences : 



Learning Fxercise 1 



DISPLAY L 



PUPILS' CRoi>rrH on reading tests 

FROM BKGllllNINC TO END OF YEAR 



12 



10 



1 -6. 



0 



.'RADL LEVEL 



2'. 7 



1.6..' 



1.0 



FIRST 




SECOMD 



THIRD 



■ Fe^mTH 



FIFTH 



sixIth 



T')e pre- and pos t te,s i. i 'lovered an instruction'al period of 
..oven moi^iths; theref o^^,, 'c;;^'}!^ .expected ^ain is .7.0 mone^rs. 
For eAairjple, tiie aean:A*:;)ri^ on the third grade pretest was 
2.8 (eighth m*<nth of second grade) and the mcian score on the 
posttest^,giA^er. the same year was 3.6 (sixth month of third 
grade). fiE or a gain , of ^ eight months based on a^ instructional 
pr_p{fram of seven inonths. ' ' " .■ 



DISPLAY 2 



mm OF TESTINO FROVrilE READING DE' 



■v ■ - ■ . ~ ^ — ^ 




Name of Test 
. • (2> ■ 


(1; 

tJ c 

X 0 

)^ 01 
0 (/I 

: 

w 


B , 
^- . 
0 
[u 


w 
> 

J 

(5) 


Months 
DGtvcon 

post tests 
(fi) 


Niinber of 
pupils 


Tost results expressed as 


Test resijlts expressed as 
mean scale scores 


Grade 
level 

^ (1) 


receiving - 
both pre- 
and post- 
tests 
(7) 


Pre- 
test . 
(8) 


\ 

Post- 
test 


Difference 
. (col. 5 ' 
minus col. 8) 

mr ■ 

— ^ ...... 


Pre- 
text 
(11) 


Post- 
test 
(12) 


Difference 

(col. u 

minus col. 11) 
(13) 


t 




















ii 






1 


Cooperative Primary 
Reading 




Pvt! li 

Psl A 


12 

19 
il 


7 




1.0 


1.6 


.6 


132 


135, 


3 


2 


Reatiing ■ 




_,._...4 
A 


....11 

23 


■■"-T-^ 


TjO 


■■l-,8 ■; 


-2.7 


.9 

i*. 


136 


142 


9- — ^ — 




Cooperative Primary 
Reading 




B 
B 


23' 
23 


I 

t ' 


■540 

f.^ 


2.8 


■ 3.6 


.8 


142 


148 


,6 ■ 


4 






W 


■ Int 1 


7 . . 


536 


3.5 


4.2 ' 


V, .7 




— e 




5 






Pre Q 
Ps,t (1 


20 
30 


, 7 


525 


4.6' 


5.1' 


.5 


355 




31 


■ 6 


LIDO) lOldi l\CaUiii{^ 




Q 


2 


7 


521 ■ 


5.2 


6.0, 

— — 


.8 


396 


417 


21 


7 














; 












. 8 


























■ 9 












r 










I*' 




10 










/ 












• 




1 1 
11 


i« 




















ti 




12 


■ ^ . 












\ 





















ERIC 

2oo 



- Learning Exercise 17 

G-.12 y '.^ . ' V • ^ 

■ Audiences:. . 

DISPLAY 3 . 



-V 



The objecti.ve of tbis reading program wag to increase pupil reading gain at 
least one month for each "month of .reading instruction. The program began 
in September • There was a full teaching st^ff .as well as a complete conple- 
ment of teachei* aides assisting with the program. 

The pretests were administered on October 1-5 and the posttegts on 
May 15, PupiJLs received 3even months- of instruction during this program. 
^T^e tests used, included the Cooperative Primary. Reading Test Form A and Form B, 
the Stanford Achievement Test,- Form W, and the California Test of Basic Skills 
Form Q. . 

Scores achieved on the administered standardized achievement tests varied 



somewhat, between the grades tested. The. scores are reported boJow: 

Grade 1 " Pupils in grade 1 made six months' > 

growth between the pre- and posttest. | 
^ ^-The pretest score was 1-0, and the , 

posttest score 1.6., ' 

Grade 2 Pupils in grade 2 made nine mpnth^' 

growth between the pre- and posttest. 
^^^^ ,The pretest scores were 1.8, and the 

pos^ttest scores 2.7. . 

Grad?' 3 Pupils in grade 3 made eight months' 

growth between the pre- and posttest. 
The pretesL scores were 2.8, anfd tthe 
* postte^-t Gcojres 3.6. 

■ ' . . ' . ^ , • . 

'Grade A Pupils in grade 4 made seven months' 

growth between pre- and .postte,&4i^. . . 

; The prete^^t score was 3.5,. and the > 

/ posttest score 4.2. 



Learning Exercise 17 
" ^ G-13 



Grade .5- Pupils in grade 3 made six months' 

t;r'owth between the pre^ and post test. • 
The pretest score wa^ 4.5, and the \ - . . 

^ post test score 5.1. -^J 

Grade 6 Pupils in grade 6 made eight months' 

growtii between pre- and posttest. 
The pretest score was 5.2, and the 
posttest score 6.0. • 

The.; objective was reached at grades 2/ 3, 4, and 6 but. was not iiiet at 
grades ,1 and 5 . ' ' . - 

The objective was" exceeded by one month at grades 3 and 6, and 
exceeded by two mon.ths at grade 2. 




\ 

\ 



. ■ " ■' Learning Exercise 18 \ 



^ LEARNING EXERCISE 18: l^ITING RECOMME^NDATIONS FOR THE FINAL REPORT 



i 



fbis exercise concerns pupi Ls in a program who have had seven months of- 
•ihstrudt.ion. using a diagnostic/prescriptive teaching approach. Your group ' 
• ,will be asked co complete a staff Review of the , inf ormu>. ion provided and 

- develop rccommenciacions to be considered by the approp-Jate decision nj^kers. 

. You are asked to assume that this, report is being suWinitted by <i program 

evcMuator to a person in your district who will take some decisive' act ion * 
based on h:*!^ .or her recommendations. In some situaCions, the' recipient would, 
be the principal: in others', the program manager, the superintendent, or the 
assi-jtant superintendent. > 

(on are asked to write the recommendations s^tion of a final report on 
the Da^is of the information in the sections that are included below and on 
^ pages -G-15 and G-16. Considering this information, what recommendations' - - 
^ \wouLd you make, to the decision maker? IVhat should be 'left as is'=' What / 
changes should be made? ^ ■ / / 



7 



■~ , : EXCERPTS ^ • 

.FROM A FINAL REPORT 

PROGRA>i OBJECTIVE ■ . 

By June 1975, the median score for program par ticlpants will have Increased 
\\y' one month for each montl? of > ins truct ion as measured by pre- and 
pos.ttesting on a standa rdized reading achievement test.' 

l£ a class has a median score of 4. -3- on November 1 on a standardized 
reading test .ind a median score of 5.1 on May 1 , '^he -class has gained*. 8 
years or 8 months. Since the instructional time span was 7 months, the **' 
objective of . one- month of gain- for each month of instruction has been 
exceeded. " . . ^ 



9 < ' 



Learning Exercise 18 
G-15 



Pm . GRAM DEgCRlPTIQN . • / ^ ' - 

In part, teachers' used the Diagno.stifc/Pri2Scriptiye,Teacl;ing (DPT) approach 
to instruction developed by the Title 1 Program. This approach involves % 
testing each pupil tq detennine jthe reading skills he , needs to master during 
the year. The te*acher ^then uses special materials designed to help each 
pupil in the area of greatest need. To determine whether instruction was 
effective, the teacher next assesses the pupil for masterly of those ^ills, 
. If" the 'pupil has mastered the skill in question, the teacher moves on to 

work with the pupil in his or her next area of need. 

A - " " ■ ■ . 

Using this approach, pupils spend less time working on tasks that are 
too^easy or coo difficult and thereby spend more time ^experiencing succes's 
..with reading tasks at their own r*ispectfve learning levels. 

\ To make this^ apt>'roach work, teacht?rs can and should use a wide variety \. 
. of instructional materials to Jielp the pupil master needed ' reading skills. 

-^^^^ • : • • 



PPv06RA>fv ACCOMPLI SllMENTS^ 

•r ■ . «^ 



Table'-I below shows how well the participating pupils, did this year in , 

improving' their reading skills. Pupils in grade for example, began the 

year with an average reading score o£ 1.8. This means they scored the same 
as most first graders who are i^^i the eighth month of school. 




GRADE 
LEVEL 



PtJPILS' GAIN ,rf5 RE/J)INP ACHIEVEMENT 
TABLE. I 





2.7 




3.6 


















4.2 


















5.1 
























1.8 




2.8 




3.5 




4.6 





6.0 



5.2 



FifeT SECOND THIRD FOURTH 



FIFTH 



SIXTH 



G-16 



Learning Exercise 18 



By the end of the year, second gr.aders were scoring at about 2.7 (an 
incr^as^ of .9); This means they gained' iMne months in reading .skills 
durLng theVyear..- " . . ' . ^ 

The t^ble shows that pupils in graces 2, 3", 4, anc4 6 gained at least 
seven^,jiionths in reading skills. Puj^/ls .in grades 1 and 5 grew six mopths and 



poprils in grade'»5 grew five months/an rt;iadi1^g skills, 




.^In summary, the obj ect Ive yWasJjuc^ at four of sJix grade levels nearly 
met. at two. A'^'^ . « * 



V- 



'OTHER FINDINGS . *' 

'to- ■ ■ 

A 'ques t ionnaire was administered to part icdpat iqg teachers, 
findi^ngs came from this questionnaire: 

• Eighty percent of 'the teachers 'reported that more 
individualized attention could have.'-been given tQ 
each pupil, if the teachers had receiveti more adult 
assistance in the classroom. 



The following 



Ninety j^ercent of the teachers reported that their 
pupils had a wide range of academic weaknesses and 
that it was impossb'.lc to provi-de adequate help to 
each pupil. . ■ - > 

Sixty-five percent of: the teachers requested adv^i- 
lonal in-service training in' managing the classroom 

o 

and iji grouping pupils for individualized instruction. 



CONCLUSIONS . . ■ . . . _ 

The Diagnostic/Prescriptive Teaching Approach to instruction met the 
objectives as planned in grades 2, 3, 4, and 6. . 

^ First and fifth grade pupils achieved.six months' growth, during the 

seven nr nths of instruction. This lower-than-anticipated rate of growth- 
suggesiis the possibility of problems in the instructional program at these 
levels th.^r need to be rectified. 



Learning Exercise 18 
G-17 



RECOMMENDATIONS ^, 

Please -'avelop at least three recommendations. After writing them on this 
^page, transfer ' them to the .transparency proVided for ypu. All" transparencies 
will be* collected at the conclusion of the exercise- and used later in groiiR^ ^ 
discussion. . . , . , 



\ 



\ 



ERIC 



.Learning Exercise \9 



LEaRNIii: EXERCISE- 19: ANiVLYZI'NG PROGRAM EVALUATION RECOtoiEm 




Every, pri^ram evaluation report should contain conclusions and reconvnenda- 
tions. These usuii.-U.y- are drafted by those. who are responsible for analyzing 
and reporiihg the data and reviewed "by the "pro ject director an-d perhaps 
by^others. Both 'conclusions and recommendations ari2 based qn the information 
developed during the evaluation process, and on the analyses and intprpreta*- . 
t ions made Using that inf orm.at ion. • • * 

o ■ ^ ■ ^ ^ * . ■ • 

"Rating of Recommendations" sheet which fgllows contains ton recommenda- . 
■ ^ ' ' . . "* " ■ * ^ . 

tions which have been submitted as parts of ^a Variety' of ^rogram=^ evaluation* 

reports. In' the lef t margin, the recommendations are cohsex:utively numbered 

a'^id recorded. In the two columns to the right are sp-^ces to rate each of the' 

recoram^^nda tions according to two criteria: ^ \ . 

o-Clarity - 'L'h*<^ wording is clear; you understand what ^ 

* ' the ^valuator is* trying to say. 

— Specif:jcity - The content is specific enough so you have 

♦ ' . . . ■ ■' * * 

>■ . ! definite*" clues as .to what needs to be, done. 



Learning Exercise 19 

C-19 
m ■ 



• • . • ^ RATING OF RECvlMMLy.::>ATIONS SHEET • 

' • * 

Rate each of the recommendation;- ai the left acco'rding to its clarity and 
iopecif icity • Use a scale from X - 3; a 1 means it is clear, a 2 means 
it is not as clear 'as i: ^i^oul.: be, and aO means it is not clear. Discuss 
with others at your <:a))I^' che reasons why you gave any 1 or 3 ratings • 



RECOM>^Ei>DATIONS 


CLARITY 


SPECIFICITY 


1, Continued emphasis should be 

placed on individual and small-r . • . 
group instructioni- . 




o 


In Decision making relative to the 
Title I (Compensatory Education) 
Program should' be done whenever . - 
possible by 'those directly 
participating-. ' . 




0 


3. Thei;e should be continual 
« evaluation of. the elements in" 
the jschool which can cause or " 
e . encourage hostility 'among pupils- 
Means to elibinate those el,ements 
^>hould be developed a^s soon as 
possible/ ' •. 


V > . < 


6 _ 


4, Since 'parent participation is 

limi»ted by emp'^oyfnent , all activ- 
ities must be* action-oriented and 
i»elevant to the pupil's education 
. program. 






5, More emphasis_ in staff development 
should be placed, on staff attitudes 
toward the pupil regardless of -the 
pupil's academic achievement^ 






6, (Before "the beginninii of the school 
year, a schedule shpuld be developed 
for the administration of all evalu- 
\ atlor^ 'instruments. Regularly 

scheduled dates should "'be set for 
'^the evaluator to observe proj(ict 
• activities- %o e.si:abl:ish the 
reliability of observationf^l 
protocols • * ' 




\ 



Learning Exercise 19 



RATII^ 0*\UEC0MMF:NDATI0NS sheet (cont'd) '- 
^ •' ■ ' ■ ' • * . 

Ra e eacrh of the recommendaticna at the ueft according td'its clarity and 
specif4.ci ty . Usrj a scale from 1 3; a 1 means it ft clear,,.a 2 means 

i t is not ♦as clear as it should be; and a 3 i^ieans^it is nat clear. Discuss 

. A- ■ ' ' ' 

^. with others at your table the reasons why you gave any 2 dr -.3 , ratings . 



o 


• ■ 

. RECOMMENDATIONS 


CLARITY 


SPECIFICITY 


f 


i' 

7. Parents and, Ldacherg vshotild hie 
actively ipvolved* 'in the eval^ 
uation ^rotiuss^by knowing the* r ^ 
purpose "of e&ch instrument aiid . 
the results as they become . 
available. They i^hould see the 
* evaluation process as a beue£it 
to them ^^n understanding the- . ^ 
pupils and liow the project^ caci 
continually be imp rove (} by a i: ' 
cooperative effort of staff 
and parentis. \ , ^ 




' — "t" a 

» 


\ 

♦ 


8. Patent workshops should be - 

given, which stress t,he practical, " 
activities involved in conducting 
^a c l,a ss i^oni. lesson. fl/3.teriaJL 
' 'preparation skills -both for the . 
classroom and the home should be 
taugh'b In a practical fashion 
• .where parents actively prepare a 
, variety of raate^?ials they can use. ' ' 




r " • 
% 




9. Plans for lessons that parents are 
expected to participate in should 
be distributed one week in advance, 
to allow the parents 'time to pre- 
pare* for the activities.' 


•* 






^ \ : : ' 1 — : 

10.**" Better communication* systems* should 
be developed to injure that all.' 
parents ^re informed of parent 
-} meetings and other cicttv^.ties of 
the project. 

4^ 1 1 '- ■ -T- 


* * 

1 ; 





ERIC 



2 \> 6 



PROGRAM "EVAi ATOR'S GUIDE 



Section H 



APPLY EVALUATION FINDINGS 



The EvaJuafiOfi Improvement Program 



ERIC 



CONTENTS 



Page 



* LEARNING EXERCISE 20:' USE OF ^EVALUATION INFORMATION. . 
LEARNING EXERCjLSE 21: ROADBLOCKS TO>RDGRA>r EVALUATION 



\ 



~fW— 

H-2 
' H-12 



■ ) 



ERIC 



3 



PREPARING TO MAKE MAXIMUM USE OF EVALUATION RESULTS 

A basia.tenet of thi^^ jlfcide is that program evaluation is something that is 

done with specific purposes in iftind, and that eva-luation. is useless unless 

those purposes are served. In Section A on purposes and^requi rements , a . - 

number of different purposes were listed and several possible audiences 

.identified for whom^ the . respective purposes seem appropriate- It was suggested 

that^tlTe "TTTf e r cnT "aTTaTences likely would have"^ q u tre" -diffeteiiL purposes 

needing to be served through a program evaluation and therefore would want 

dlffeienc kinds of information ta meet their needs. In Section G, on repo'fting, 

diversities of purpose and audience and the consequent needs for tailoring 

program evaluation compo[?cnts to^meet those diversified requirements were 

again empha^sized. The steps outlined in those two sections are probably the 

most productive things an evaluator pofsibly ran do to ensure that effective 

pse-will-be made of the evaluation findings, conclusions, and recommendations. 

BrTef summaries of thfise steps follow: 

^_ ■ , » • 

1. Detennine all the purposes the program evaluation is to serve. / 

2. Make ^explicit various questiiDns that all users would like . to have , 
answered in satisfying their program 'evaluation .needs . ^ * 

3. Identify the kinds of information that will provq acceptable,ai. 
- evidence bearing upon those questions* 

4. Provide interun report.-, during the progress of ^the program to ^ 
give early evidence of movement towards program outcomes, even 
if "soft" data need to be used* - 

5. Prepare the final report clearly and succinctly. ^' Tne data and data 
interpretati'onfe should be presented Ln a manner that \<ilL help 

the reader recall the questions addressed and und'erst;and the 
■ nature and, significance of thosanswers provided. 



t. , 



, • . ' . ..Learning Exercise 20, 

H-2 ■ / . \ _.. 



LEARNING EXERCISE- 20: ^ USE OF EVALUATION INFORMATION 



There are a number of audiences, for evaluation reports, some of which' are 
listed below. YOur group is to select an audience from the list or select 
another of your own choice, whichever you choose, your group should put 
itself in the .position of that audience as you complete this exercise.. 



School Board 
Community Group 
"Superintendent and 

As sociates 
Principal and 

Administration 



Teacher Association 
Parent Organization 



Pupil-. Gjpup 



Other 



A^ leaders in one or another of these groups, determine one or ,more 
■ <' 

purposes that you would want ^addressed in the program. 

- I . . ' ., 

1 Read the final program evaluation repo^rt .that begins on page H-^ And 
discuss it from your points of view. List as many actions,' decisions, 
recommendations, or other uses that your group can act upon from the 
information supplied. 'List also some thlngis that your group would like to 
have seen in the report but that were not included ,. and note the areas left 
without decision a.s a result of these shortcomings. ^ 

> ^ 

Make notes concerning your discussions on each of these thtee point.'; 
on the exercise form on the followiitg page. ' ^ 



"Learning Exercise 20 

H-3 



EXERCISE FORM 
USE OF EVALUATION INFORMATION 



Audience 



--PurposeCs) for yxo^v-iu evaluation: 



Uses' that could be made of theOvaluation information, in priorityjv 



Things that might have been included in the evaluation that would hi^ve been 
helpful to your audience. (Again, put in priority ranking): [ 



Areas that are left without decisions as a result of these shortcomings 



Er|c . ' . 



Learning Exercise 20.- 

. ■■ > . . •■ , ■ ' H-5 

'SUNSET UNIFIED SCHOOL pi STRICT 



DIAGNOSTIC AND PRESCRIPTIVE READING PROGRAM - . 

•..•*:• ' i 

JUNt .1976 ■ ' ■ 

PROGRAM. EVALUATION REPORT' 
PROGRAM GOAL • . . . . 

^ 

The goal of the Diagnostic and, Pirescriptiye^ Reading Program is to 
provide greater reading achievement' gains for participating 
pupils than: the. traditional reading program provided for the same 
pupils during the previous year . • 

. • " f < 

PROGRAM OBJECTIVES . ' 

• ■ Co 

1. Pupils participating in. the Diagnostic and Prescriptive Reading 
Program will obtain an average gain of one month of reading 
achievenent for each month of reading instruction as measured 
by pre-post testing with a standarized reading achievement 
test. The pupils' performance in the' traditional reading 
. 'program in 1974-75 yielded an/average gain of one-half month 
of reading achievement per month of reading instruction. 

■ - ' i ' 

PROGRAM DESCRIPT ION \ * ^ ' 

-- V 

All pupils* in grades one through six in Elmhurst, Diogenes and 
Mounthaven elementary schools in the Sunaet- Unified School 
District participated' in' the Diagnostic and Prescriptive Reading 



Learning Exercise 20 

H-6 * ^ 

*• 

Program during the school year of 1975-76. - ^ 

■ " " i . 

Teacher? utilized tiie ^Dia^ntostic/Prescriptive Teaching (DPT) 

approach to reading instruction as developed by the district's 

ESEA Title I Compensatory Education Program/ This approach ; _ 

involves assessing each pupil to.determine his current mastery y' 

level and the skills to be further mastered during the school ' 

year. The teacher then uses special materials designed to assist 

each pupil in his areas of need-. After each unit of instruction, 

the teacher again assesses the pupil for mast'Firy of the Weeific 

skills that were taught to determine whether the instruction was 

effective. If the pupil has mastered ..the skills ir> question, the 

teacher moves on to work with the pupil in his or her next area '\ 

of need. . . 

In this approach, teachers use a wide variety of instructional 
materials, and equipment./ Class size -was^' limited to 28-30 p-ipils. 
Each teacher had an instructional aide for ^hie ^purpose of 
assisting .fhe pupils for'three hours each day. 

The program was;.; in Operation from November 1, 1975 to May 31, 1976 
for a to'tal of seven months of ^ instructior^al time. 

PROGRAM EVALUATION PROCEDURES ' 

The Cooperative^P'rimary Reading Test was administered to all first, 
second and third grade pupils by their classroom teachers on 
November 1> 1975 and again on May 31, 1976 for pre-post measurement 
Qf reading achievement. ..The Reading Test of the California Test 
of Basic Skills was administered to pupils, in grades four, five 
and six by their teachers on the same dates as above. 

'• A 'questionnaire was developed and administered to each classroom 
teacher to purvey their attitudes toward the program in May, 1976. 



Learning Exercise 20 
. H-7 



PROGRAM ACC O MPLISHMENTS • ' n ' . , , . ' 

I J . ;' 

The program was implemented as described in th^ evaluation plan.. 
The -•^instructional aides were viewed as beiri^ ^^helpful by the 
^classroom teachers in each school, learning prescriptions for 
pupils were developed for each pupil by the teacher 'after assess- 
ment of individual skill levels. VJith the exception of. one school^ 
learning centers for pupils were established and functioned as ^ 
expected. Pupil testing was accomplished -as scheduled and materials 
and equipment were provided as required in each school. Ongoing 
pupil records jvere adequately maintained as required for diagftostie 
prescriptive instruction . i. 

The evaluation protedures determined to what extent the objective 
of an average gain of one month of .reading achievement for each 
instructional month, or in other words ^ an averagre gain of seven, 
jnonths' an reading achievement in a seven-months instructional ^ 
> period for each class in grades one through six , was accomplishe/i . 
Table I gives the resi^lts in graphic form. 



12 

10 



' O H 

. U < 

grade; 

LEVEL 



8 

7 ■ 
6 

4 

2 
0 



PUPILS' GAIN IN READING ACHIEVEMEfJT 
.NOVEMBER 1975'tOMAY 1976 
TABLE I 



1.6 



1.0 



2.7 



1.8 



4.2 , 



2.8 • 



3.5 



5.1 



4.6 



6.0 



5.2 



ElRS-f SECOND THIRD FOURTH FIFTH SIXTH 



ERIC 



3 V,. < 



Learning Exercise* 20 
H-8 ^ . ^ . - • . * 

•It will be noted that in a seven-month period pupils in the fi^rst 

grade had a mean reading score of 1/0 in November and 1.6 in . 
May, They, therefore , made an average gain of «6 or six months, 
which is one month short of the stated objective. 

Second grade pupils begain the instructional program with an 
average r^eading score of ^1.8 and ended With an average reading 
sco^e of 2.7 with: a mean gain of .9 or nine months, which is two 
months in excess of the stated objective of seven months. 

" PupTiT^ mean reading achievement 

Score .of 2 . 8 "at the beginning of the program and 3.6 at the end 

. with a mean gain of .8 or eight months' which is one month in 
excess of the stated objective of seven months. 

Fourth grade pupils had an average reading score of 3.5 at the 
beginning of the program and 4.2 at the end with a mean gain of. / 
.7. .or .seven months, which is identical to the stated objective 
of sever), months. ' . 

. Fifth grade pupils had an average reading score of 4.6 ^t the 
beginning of the program and 5.X at the end with a mean gain of 
.5 or five months, which is two months less than the stated 

• objective . • 

Pupils in the sixth grade began the instructional program with 
a mean reading score of 5.2 and ended with 6.0 with a mean gain 
of .8 or eight months, which is one , month in excess of the 
stated objective of seven months. ^ 

In summary pupils in grades two, three, four and six gained a 
mean score of seven months or mor^.; Pupils in grades one and 
five did not meet the stated objective of seven months, though 
grade one missed Ly only one month and grade five by two months. 



- , Learning Exercises 20- 

H-9' 

e , * . . . p 

The overall mean gain of all pupils in grades one through six 
was ,7.1 or slightly over seven months, which met the general 

^"Objective of all- pupils partri^^^^^ Diaghbstic and 

^ ... ' . 

Prescriptive Reading Program making an average gain of bhe 
month Qf reading achievement for each month of reading instruction 
'as measured by pre-post testing with a standarized reading achievement 
test* < 



- OTHER FINDINGS ^ 

- A "locally iieveioped questionnaire administered to all participating 
teachers revealed that: - - 

• Eighty percent of the teachers reported that 

. more individualized attention could have been 
. given to each pupil if the teachers had received ^ , 

mpre adult assistance in the classroom, 

• ■ *" . >• 

• Ninety percent of the teachers reported that 
their pupils had a wide range of academic 

^ weaknesses and that it was impossible to 
provide adequate assistance to each pupil. 



• , sixty-five percent of the teachers requested 
irfservice training in managing and grouping 
pupils for individualized instruction. . 



CONCLUSIONS 



The Dia^gnostic/PrescriptiveN^eaching approach to reading 
instruction met the objectivesN^'' planned^ in grades one, two. 



three, four and six. 



Learning Exercise 

H-10 



The first grade results suggest the possibility of pf-oblems in 

the instructional program at that level. 

» . • ■ • 

An analysis 'of the testing procedures at the fifth grade. level 

■ • ■ ■'' ' . ' ' ' •- 

revealed that different levels of the same test were used at 

pretest and post-test times l Use of inappropriate tests contam- 
inated accurate reportinig of pupil accomplishment at the fifth 
grade level and therefore interpretation of the data m\ist be 
tentative. -"^ 



RECOMMENDAT I ON S 

1. * The biagriostic-Prescriptive Reading Program should .be 

continued in grades one tHtough six at Elmhurst Diogenes 
And_ Mounthaven elementary schools dijring the 1975-76 
school years wxttr-appropriate attention to the stated 
• recoppendations . . . • ^ * . ^ 

2. The variability of achievement gains in the various'* 

' 'grade levels should be further explored Some grade 



levels seem to be benetiting more from the DPT approach 

than others. It would be well to consider a school- 

■ II . . " 

by-school analysis of the' grade level data. 

) ' ' '■ ■ ** 

3. Explore the variability within schools in the reading 

achievement scores, particularly in those grades which 
did not meet the objective of seven^ months gain in a 
seven-months ins^tructional progrsun. ' ^ 

4. Investigate the situation of one school not providing 
learning cernters for pupils. Explore the possibility 
of .testing learning cf^nters/ vs. no Ifearning centers 
-in next year's evaluation design. 



Lqarnins Exercise 20 
H-11 



Alternate forms of the sane level of the test should 
be used in pre and post^testing at all grad^ levels . 



6. Con.si:d^ the establishment of a cooperative teaching, 
arrangement and allow pupils who lack certain skills 
to wcSrk with teachers' who have special exp r+■.'^Je^ in 
these areas. , _ 

7 . Effor ts should be mad e to increase the number of 
hours worked by instructional aides^r^ to increase 
4:he nximber of aides • 

€. Consider the use of v61unteer parents as aides in 
the classroom, 

'9, Provide additional inservice education opportunities 

^Hor teachers in^ the a^ea of managing and grouping ' 
— pupils for individualized instruction. 



LO » The_evaluaLtim--^ 

Prescriptive Reading Program should be continued for the 

1976-77 school year. 



ERIC 



3 \ 0 



• • ^'\. • . ■ ■ • 

. Learning Exercise 21 

H-12' r « . ' ^' - - - 



LE^KISG . EXERCISE 21: ROADBLOCKS TO PROGRAM EVALUATION 

There are :many reasons why the evaluations of educaticftial programs, are 
resisted. Far^ticipants will now-^e divided into "role" groups of three or 
four people each: board menibers, principals, classroom teachers, parents, 
superintendents, and so on. - ' 

Each, group, looking atprogram evaluation from the view point of Its 
role, should list as niany roadblocks as possible to effective use- >of evalua- 
t-ion -results-. — ^Aftrer-^hese have been posted and reported on, the workshop 
leader will promote discussions of ways to overcome, circumvent^ .or minimize 
each roadblock identification. - . 



PROGRAM EVALUATOR'S GUIDE 



Section I 



SELECTED BIBLIOGRAPHY 



The Evaluation Improvement Program 



ERIC 



SELECTED BIBLIOGRAPHY 

./•, " / ■. . - ■ . 

/ ^Ahman/j. Stanley and Marvin .D . Glook. Evaluating Pupil : 
' Growth,' Principles of Tes€s. and" Measurements ., 4th. Ed: . Boston. 
Ally.p and Bacon, Inc., 1971 

• ^ ^ . . ^ • . ■ • 

Aiken, Jr'. , Lewis R. Psy'cfhological an^^Educational Tes i-ing ,. 
1st Ed. Boston: Allyn and Bacon, Inc. i-igm' 

t • . 

Anderson, S. B./.-Ball, S., Murphy, "R. T . | ;^il)d others . " . .' 
Encyclopedia of Educational Evaluation . San Frafipisco: . Jossey-Bass, 
^1975. • ' . . . . • ■ . • " ' ■ ■ • 

- ■ ■ Bloom-,- B . - S . Taxonomy- of Educational-T-Qbtectives ;-v Cognit ive 
Domain . New York: David McKay, 1956..,. , 

Borich, G. D.. (Bd. ) . Evaluating .Educational Programs>,.and ■; , , 
Products . ^Englewood Cliffs, New Jersey: Educational .'^.echnology . ;• 
Publications^ 1974. i ; 

Browi),;' Frederick G. Measurement and Evaluation ). 1st .Ed. , . " 
Illinois: F. E'. Peacock Publishers , - Irtc . , 19711: ; 

Brown, Frederick G. P rinciples of Educational, and Psycho -' 
logical Testing . Illinois: The D'ryden 'Press , Inc.-, 1970. 

Cronbach, Lee J. . Essentials of psychological Testing ; 2na Ed . 
, New York: Harper & Row^ 19'6;0 . • . ■' . ] \ . _ ' .' 

■■' Ebel, Robert. L. Measuring Educational Achievement ./ Englewood 
Cliffs, New JerS'ey: Prentice Hall, 19,6,5. 

Feldt, L. S-. "What Size Samples f;or^ Methods/Materials 
Experiments?", Jr., of Ed. Measure . Vo,l , 10, No/a, 1-973, 225. 

." Flanders, - N. A. - Teacher Influence ,^^upi.l Attitudes aad 
Acihievement . Cooperative Researsl;! Monograph No. 12 , QE 25040. . 
-Washington, D.C. ; U . S . Government. Printing Of fice.,.^ 19 65 . . 

• '■ Good, T. L. and J. E. Brophy. Looking in Cl^ssroojns . San, 
F,;:anGisco: Harper & Row, 1973, pp. 6-2 and 63. Reprinted c 1975, 
Harper & Row Publishers. * ' , i, ■ " ' 

G^onlund, Norman E . Preparing .Criterion- Referenced. To.Sifcs::'Cpr 
Classroom Instruction,' A Titl;g an, the Current-Topics i n Clargsroom 
■ I last ruction Series , Nev""York: The MacMil lian^Company , 1973 .- 

?JV .House, E..,.R. v(Ed.) . " Schoo]'/ Evalu ation : ^Politics" and' Process 
Befkelev: McC-utchan, 1973.' • . v... 

:•. Isaac-, Stephen and William" E. Michael.^ Handbook in ResearcJi 
'and'ky alu.ation .-^ San Diego: Robert R. Knapp, 1971. 



1-2 



^rathwohl, D,^ R., B. S. Bloom :arid'"B. E. Masia. Taxonomy . 
of Educational Qbjectiives : Affective. Domain. uew York':/ David 
McKay, 1964. ; ^ ^ . 

* Krejcie,f Robert V. 'and Daryle W. Morgan. "Deterir.ining 
Sample Size for Re,searcb Activities," Ed . & Psycho . ' Meas . , 
Vol. 30,. 1970,' 6b7-0iO".,/. ■ . ' \ , . . " 

> ■ / ' ' 

- fvropp,^ R. P. and C. Verner. - An attitude scale techAique 
for evaluating meetings . Adult Education , VII (4) , Summer, 1957 , 

Lijitpn,'M. and P.S. Gallo, Jr^ The Practical Statistican : 
Simpllrfied Handbook of Statistics, Monterey, California: Brooks. 
Cole, 1975. 

K » • ^ \ o • ■ . . • 

Loret, P.O., et al. Anchor Test Study: Equivalance and norms 
tables selected^f or reading achievement tests (grades A,5.,_& 6). 
IQffice pg Education, Report 74-305 , U.S. Government Printing Office, 
Washington, D.C., ]974. . 

Marshall, Jon Clark and Loyde WesTey Hales. Essentials of. . 
Testing , California** Addison-,Wesley Pub'lishing Company , 1972 ;^ 

Marsh/all, Jon Clark and Lcyde Wesley Hales. Classroom Test • 
Constructiqn , California: Addison-»Wesley Publishing. Company, 1971. 

HcFarland, ^Susan J. and Carl F, Herefdrd. Statistics -and 
Me asurement in the Classroom, Psychological Foundations 'of 
IBucation Series^ 2nd'' fed. Iowa : William C . Bloom Company Publishers , 



£duca 



Mehreiis , VJilliar? A. and Irvin J. Lehmann. Measu re ment and 
Evaluation and Psychology , California: Holt, Rinehaft * and Winston, 
Inc.,, 1973. ^ : * ' . 

Metfessel," Newton S/ and William B. Michael. A Paradigm 
Involving Multiple' C rit erion Measures for the Evaluation of . - 
Ef f ect'iveness of SchooT Programs, Eau c atiorial and Psychological - 
Measurement. - 1967 . , . ^ : , 

. • ^ . -A 

National Assessment of Educational I^ogres's., Citizenship : 
National R'esults . . Denver, Colorado, November 1970. 

Nunnally, Jr: , Jum C' . Introduction to Psychological Measureme. :, 
California:^ McGraw-Hill Book Company, 1970. ' ^ 

Payne, D. -A. '(Ed.). Curriculum' Evaluation . Lexington, 
Massachusetts: D. C. 'Heath,- 1974. - ^ ' 

, Popham, W. J. . Educational Evaluation . Englewood Cliffs, 
New Jersey: Prentice Hall , 1975. • ' ' . 

Provus, M. M. Discrepancy Evaluation: "~Tor Educational 
Pro.gram Improvement and Aissessment . Berkeley: McCutchan, 1971.. 



ERIC , -,y^ , ^ ^.3 



P^yr^holngical Corporation. Test Service Bulletin #48 . 
Printed by permission of the p.ublisher , Psychological Cdrporatlon^.^^ 

^ .. Sirnpsor . E.J v ' 'The Class if ication- of Educational Ob.iectives , 
Prychor-'otor D omain" . ' Illinois 'teacher, l^bb-b L 

'. Simpson, E.J. Scheme for Classification of Educational 
Ob.fecfei'ves : Psychomotor Domain . Unpublished Manuscript, 1969. 

Slonin, M.J. Sampling: A Quick. Reliable Guide to Prac tical 
Statistics ,- New York: Simon 6c Schuster , 1967, 7^. , 

"Small-sample techniques. The NEA Research Bulletin , "Vol : 38,. 
December ■ 1960, 99-104. ' • 

Smith, Fred M. and Sam Adams. Educational Measurement For t he 
Class room/^Teacher , 2nd Ed. California: Harper &. Row, Publishers, 
WlT. ■ , ■ ■ — 

Stake, R. E. The Countena,nce of Educational Evaluation. 
Teachers College Record , 1967 , 58^, 523-540. 

'Thorndike, Robert L. and Elizabeth Ha;:en. Measurement and ' 
Ev aluation in Fsydholo g v and Education , 3rd Ed. New Yorlc: John 
.Wiley and Sons, Inc., 19G9. c 

" Tyler, Leona E. Tests and Measurements . 2nd Ed. New 
J'ersey: Prentice Hall,- Inc., 1971. - • ' ' 

"i Tyler, R. W., Gagne, R. M. &Scriven, M. Perspectives on 
' Curriculum Evaluation. -(American Education Research Association 
Monograph #1 in -Curriculum Evaluation Series) . Chicago: Rand- 
McNally , 19 67. 

'^yler^ R.'W. (Ed.). E ducatiSnal Evaluation: New ..Role s , _New 
Means. Chicago: University of Chicago Press, 19 69. 

i Tyler, R. W. and Wolf, R. M-. Crucial Iss ues in Te.3ting. 

.. Berkeley r McCutchan, 1974 .' - 

JValberg, H. J. Evaluating Educational Performanc e. Berkeley: 
McCutchan, 1974. • 

; • Walker, Helert M. , and J. T.pvv*^ Statistical Inference. New York: 
Holt, .Rinehart and Winston, 1953; ' ^ . ^ 

'.■ Wick John W. Education'al Measuremen t. VHiere Are We Going and 
How Will We Know When We Get There ? Ohip : Charles E. Ke mil ^ 
Publishing, Company, 1973 . ^~ 

■ Womer, 'Frank B. Gu idance Monograph Seri es , _S^rie s III': Testi ng' 
R.sic Concepts in TestIBF r~Wi7^rtTi^^-nT^hton Mittl Company, 

T9W. - ^ 



Worthen, B. R. and Sanders, J. R. Educatidnal Evaluation : 
Tiieory and Practice. Worthing ton , Ohio: C. A. Jones, 1973.. 



PROGRAM EVALUATOR'S GUIDE 



Section J 



APPENDICES 



The Evaluation Improvement Program 



ERIC 



3 . u 



CONSENTS 



APPENDIX A: RESOURCES FOR INPOimATION ABOUT OBJECTIVES im INSTRUMENTS 
o 

APPENDIX B: ' SELECTED LIST bF TK:^T PUBLISHERS 

^KeNDIX C: ^^ULTIPLE CRITERION MEASURES. . . , 



J-1 
. J-10 
J-1 2 



ERIC 



3 If 



APPENDIX A' 



RESOURCES FOR INFORMATION ABOUT 
OBJECTIVES AND INSTRUMENTS 



I. Test Bulletins Published at 'Irregular Intervals 

. Normline, Harcourt Brace Jovanovich, Inc. 

Test- Data Reports., Harcourt Brace Jovanovich, Inc. 
Test Service Bulletins, Harcourt Brace Jovanovich, Inc. 
Test Service Bulletins., The Psychological Corporation 
Test Service Notebook, Harcourt, Brace |. World, Inc-. 
Testing Today, Houghton Mifflin Company 

II. Newsletters 

~^ The ACT 'newsletter, American College Testing Program . 
Education Recaps, Educational Testing Service ^ 
•ETS Developments, Educational Testing Service 
-ItemsV Cooperative Test Division, Educational 

Testing Service . *; . . 

Measurement in Education, National Cpuncil on 

Measurement in Education , •. ^ 

' NAEP Newsletter, National Assessment of Educationa^ 

. Progriess : _ . ' 

Test Collection Bulletin, Educational Testing Service 

III. Educational .and Psychological Journals 

- -' 
American Educational Research Journal 
Education and PsycRo:|..ogical Measurement 
Jc/irnal of Educational Measurement . . 
Journal of Educational . Psychology 
" Measurement and Evaluation in Guidance 
Psyghological Abstracts: "Methodology and Research. 

Technology: . Testing" and "Educational Psychology: 

Testing" . ' ' 

Review of Educational Research 

IV. Annual Reports and Proceedings 

Annual . Reports , College Entrance Examination Board 
-Annual Reports, Educational Testing Service 
Proceedings, Annual Invitational Conference on Testing 
Problems, Educational Testing Service .. . ^. 

' Proceedings , Annual Western .Regional Conference on Testing 
Problems, Educational Testln.g Service 



3 > tj 



Published Objectives and Ob jective-Ref erencecj Tests 

Instructional Ob jectives^, Exchani^e , ^Box 24095 , ^ 

Los Angeles, Calif ornia ' 90024 
SCORE, . Wes-^inghouse Learning Corporation, • ' 

P.O. Box 30, Iowa C4.ty, Iowa 52240 ^ 
National Evaluation Systems, -P.O. -Box 226, 

Amheirst, Massachusetts 01002 

Miscellaneous Paperback Books afid Bulletins 

.' . »*■ 

EVALUATION AND ADVISORY. SERIES, ,Educational Testing 
Service ^ ^ 



^.ETS .Builds A Test , 1965. 
Locating Information on Educational Measurement: 
Sources and References,. 1969, 

Making 'the Classpoom Test: A Guide for Teachers, 
Second Edition ,* 1961 . ^ • 

4. Multiple-Choice- Questions: A Close Look, 196 3. 

5. Selecting an Achievement Test : Principles and 
Procedures, Second Edition^ ;1961. 

6i Short-Cut Statistics f or Tea'Cher-Mkde T^sts , 
Second Edition, 1964. > 

GUIDANCE MONOGRAPH SERIEIS^.SET III: Testing , Houghton 
, Mifflin Company/ 1968 



1. Modern Mental Measurement:* A Historical Perspective 

2. . Basic Concepts in Testing ^ v 

3. Types of Test Scores 

4. School Testing Programs . ^ ; v/' 

5'. Intelligence, .Aptitude, and Achievement Testing 

6. Interest and personality Inventories 

7. Tests on Trial ^ 

8. Automated DaTta Processing in Testing— - — ^ 

9. Controversial Issues i^ Testing 

Engelhart>'" M.D. Improving Classroom Testing , Washington: 
National Education Association , 1964-./V 

•French, J, E,., and W.,B. Michael. StaYidards for Educational 
Psychological Tests and Manuals. Washipgton: American ' 
Psychological Association,' 1965. * 



McLaughlin;,. K.F. Interpretation of Test Results . 
Washington: U.S. Government Printing Of f ice 1964 . 



VII. An Annotated Bibliography of Guides- for Test Selection 

Compiled by John Jegi, Director, ACCESS Information 
Center,-Contra Costa County Superintendent of Schools 
Office . ; 

Euros r Oscay K., ed. Mental Measurements Yearbooks ; 1st ed. - _ 
1938 (re-i;5sued 1972); 2nd ed, - 1941 (re-issued 1972); 3rd ed. 
1949; 4th ed, - 1953; 5th ed, - 1959; 6th ed. - 1965; 7th fed. - 
1972 (2 vols 4) . Highland Park, New Jersey: The Gryphon Press.. 

Single best source of critical reviews of tests. Each yearbook 
contains critical reviews of all obtainable published tesjfcs and 
•books on measurement written in English,, Most publications are 
reviewed independently by two or moire specialists. Revifews m 
earlier editions are cross-referenced in later ones. 

. Tests, in Print, Highland Park, New Jersey: The Gryphon 



Press, 1961, 

A comprehensive test bibliography and index to the' f irsit: five' ; y 
books in the Mental Measurements Yearbook series. ^E^ch test v _ ' 
mentioned includes information concerning test , title , a^jp-ropriate 
grade levels, publication data, special short comments about the 
test, number and ^ypes of scores provided, authors, publishers, apd 
reference to test reviews in Mental Measurements. Yearbooks . 

Tests in Print. Volujne 2 . Higiiland Park, New Jersey: The • 
"Gryphon Press, 197 4. 

Index to tests still in print that are listed in all seven Mental ^ 
Me asurements Yearbooks . Includes bibliographies of references 
on the construction, use and validity of specific tests published 
through 1971. ^ 

, . Personality Tests and ■ Revie ws . Highland Park, New- Jersey: 
~fhe Gryphon Press, 1970. ^ 

Provides compilation of personality test reviews and specific test 
bibliographies listed in the first six Mental Measurements Yearbook 
include^ new material on personality testing, including a compre- 
hensive bibliography of/'^13 personality 'tests and 7,116 new 
references dealing witlr^'the construction, ^se, and validity ot 
specific tests. AIs^d'^ includes a master index to the npnpersonality 
tests, reviews and references to the first six Mental Measurements 
Year books . Eighty tests — new, revised, or supplemented since the 
Sixth Yearbook and hot listed in the Seventh Yearbook—are .included 



_. Reading Ties ts and Reviews , Highland Park, ^ New i^Jerseyr 
The Gryphon Press,. 1968, V . 

Includes a comprehensive bibliography of reading tests as of 
eclriy 1968 / a reprinting* of all reading tiest reviews in the 
first six Mental Measurements Yearbooks , and a master classified. . 
index to all other tests and reviews in the first six Yearbooks.. 
InclCides information about 33 reading tests — new, revised, or 
supplemented since the Sixth Yearbook which are not listed in 
the Seventh Yearbook; * * . 

Hoepfner, Ralph, Ed. CSE Elementary School. Test Evaluations . ~ * 

Ix>s Angeles: Center for the Study of Evaluation , UCLA, 1970. 

This book contains a compendium of testis,, keyed to educational 
.-objectives of elementary school education , and' evaluated by 
measurement: experts and educators for such characteristics as 
meahingfulness , examineee appropriateness, administrative usability 
and quality of standardization. 

Hoepfner, -Ralph; Stern/ Carolyn ; and Nummendal, Susan G, , eds . .// 
C3E-ECRC Preschool/Kinderqartgn Test Evaluations . Los Angeles: 
Center for the Study of Evaluation and the Early Childhood i 
Research Center, UCLA, 1971. 

,This book contains a compen«3ium of. tests, keyed to educational 
'objectives of early childhood education, and evaluated Iby 
measurement experts and educators. 
■ ' ' ' ■ .. \ " ■ ' - ■ ' ■ ■ 

Johnson, Orval ^G. , and Bommarito, James. . Tes ts and Measurements 

in Child Development; A Handbook . San Francisco: Jossey-Bass, 
Inc,, Publishers, 197!^. . 

A guide to more than 300. measures of child behavior and develbptnent 
not available from test publishers. Authors cite six criteria for 
inclusion: (1) suitability for use.with children betwqen birth 
/ and age twelve; ('2) availability,^ to. '^prafessionals; /{3) unpublished, 
not commercially available;. (4)* permit development of norms and 
reliability and validity data; (5) include enough infoimriation for ' 
effective use; (6) technically useable, measures. classified in ten. 
categories:' (a) cognitive, (b) personality and femotional charac- 
teristics, (c) children's perceptions of environment , (d) self- 
concept, (e) actual environment, (f) motot skills., brain injury, 
sensory percepition, (g) physical attributes, (h). attitudes and 
interests not otherwise classified, (1) social behavior , and 
(j) measures not fitting the above categories 

Johnson, T. J . , and Hess , R.J. Tests in the Arts . St. Louis: \ 
Central Midwestern Regional Educational Laboratory (CEMREL), 
3120 - 59th Street, St.; Louis, Mo.Vl970. 

Indexes and abstracts all known measuring instruments and tests 
applicable to the arts and provides a brief but comprehensive . 
overview of the various, psychometric methodologies utilized in 
the development of the instruments. ^ 



Robinson, John P.", and Shaver, Philip R, Measures of Social 

P svcholoqiGaX Attitudes . Ann Arbor, Michigan; Publications^^ 
Division, Institute for Social Research, University of- Michigan, 



1969 

Review of 106 test instruments grouped into eight general cate- x - 
gories: Life satisfaction; Self resteem; Alienation; Author-- ^ \ 
itarianism; Socio-political attitudes Values ? General attitudes 
toward people; and Religious attitudes. Evaluation of instruments 
psychometric properties given as well*- as ease of . administration 
and scoring,. '\ ^ ^ 

- *Av^ilable in 1974 revised edition.^ 

Simon, A: and E.G. Boyer, Mirro rs for Behavior: An Antholopy of 
Clagsroom Observations Instruments . Philadelphia: ^ Research 
for Better Schools, 1967. . . 

An annotated coippilation of 86 observations instruments 
representating a variety of approaches, both in the--, 
affective and" cognitive domains . Extensive jDibliography . 

■ " ■ ■ ^' ■ ■ ■ - . •■ • ^ ' : . .■■ : 

Walker, Deborah K. Socioemotional Measures /for Preschool and- 
Kindergarten Children , San Francisco; / jbsseyBass , Inc. ^ 
Publishers, 1973. 

Desciriptioh of 143 tests and measures of social and- emotional 
development including titles arid dates of publication or copy- 
V righ't; authoir; appropriate age range; measurement technique; 
source in vjhich measure is described; description of the 
instrument; norms available; validity studies; and reliability 
■ eviaerice. . ; ■ ■ , . ' ' 

Wall, J^net, and Suromerlin , Lee . Standardized Science Tests; A 
' Deslcriptive Listing . ^Washington, D.C.: National Science 
Teachers' Association 7 1973.- (Order direct from NSTA, 1201 
Sixteenth Street, NW , Washington / D .C . 2^0036 $1 . 50) . ^. 

'A compilation of Virtually all the standardized science tests 

published since 1959 available to elementary and. secondafy 
, science teachers. (57. pages) . 

• • - .. • ! - . ' * '■■ • , . ■ ' 

The following dociame.nts are from the ERIC , Clearinghouse on Tests,-- 
Measurement, and Evaluation , Educational Testing Service, Princeton, 
New Jersey and are available on microfiche. Items selected include . 
clearinghouse publications through April , 1975. 

; ED '0^6 0^2. Rosen, Pamela/, and Horne, Eleanor V. Language Development 
Tests : An Annotated Bibliography , 1971 ^ 

Brief annotations of currently available language development 
" measures appropriate for use with preschool children .as well as 
with lower elementary grade children Igrades 1 through 3) are 
presented. The annotatiori provides information cpncerning the 
purpose of the test; the groups for which it is included ; test 
subdivisions or tested skills , behaviors, or . competencies; -'admin- 
istration ; scoring ; interpretation; and standardization. (14 p^ges) 



J-6 



ED 056 083, Guthrie, P,D,, and Horne, Eleanor vj School - 
Readiness Measures; An Anno '^ ted jBdbliography , -1^71. 

Brie^ Annotations of currently available general school readiness 
measures are presented," The annotation provides* information con- 
qerning the purpose of the test; the groups for which it is 
. intended; test subdivisions or tested skills, behaviors, or 
competencies; administration; scoring ; interpretation; and stand- _ • 
ardization. An alphabetical listing of the instruments which'. ; 
indicates the ages for which each is suitable is also included. 
(26 pages) ' - - > « 



ED 056 085. Guthrie, P.D,, and others.. Measures of Social Skills; . .^-^^ 
An Annotated Bibliography ^ 1971. ^ ~ — ^f"" 

■■ . . j 

Brief annotations of instruments concerned with a variety of social 
skills measures appropriate for use with children from the pre- - 
school level through the third grade are prov.ided. Included 
are tests designed to measure social competency, interpersonal' 

competency, social maturity , social sensitivity, and: attitudes- 

toward others. The annotation--provides information concerning 
the purpose of the test; the groups for which it is intended; 
test subdivisions or tested skills; behaviors or compe£encies ; 
administratiion; scoring; interpretation ;* and standardization. An " 
age table is also provided which /lists the tests alphabetically, 
indicates the ages for which each,. instrument is considered 
suitable, and gives the page on which each annotation appears. 
(28 pages).. ' 

ED 074 071. Knapp, Joan, Comp. An Omnibus, of Measures Re lated to 
School-Based Attitudes , 1972"i \ ~ r ' 

. . . ! ? ' ' \ ' y 

' \ ' ^ ■ ■ ■■ 

Summaries are provided for 16 measures of * school-based attitudes. 
^I"l7<^f the instrtaments are paper and pencil, self -report inven- 
tories. Some are designed for children 4-8 years of age? others 
are for stui^ents in gr^d^ 12-14. Each .6f the i'nstruments is 
presented in the followinV^f ormat : Title, Description, Subjects,. 
Response Mode, Scoring, and\Comments The 16 measures^ are : 
... Survey of Study Habits and Att.itudes ; School Interest inventory ; 
The Student Opinion Poll II: ^^hool >5orale Scale; Measures of . 
School and Learning Attitudi.-^ ; Attitudes Toward Education ; , 
Polittle ^Sentence Completion Testv; Pictographic Self Rating 
Scale; Children's Attitudinal Rangfe' Indicator ; When Do I Smile?; * 
Attitude Toward Any School Sub ject ;^ttitude Instrument to 
Evaluate Student . Attitude 'Toward Scieti^ Scientists; Inventory 

of Reading Attitude; A Childhood Attitude Inventory for Problem 
Y Solving; Mathematics Attitude Scale; and\.a Semantic Differential 
for Measuring Attitudes of Elementary School Children Toward 
Mathematics. .Fifteen references are provi^^d. (24 pages) . • 

ED 080 534. Kriapp, Joan/> Comp. . A Selection of Sb^lf Concept 
Measures. 1973. 



EKLC 



3 ^ V 



J-7 



0. '> 



This compUation is comprised of descriptions 9^ • 
~ineasxiring self-concept, The instruments were chosen^on the basis 
■ of the following criteria-/ they should be suitable C;or and^ 
reflect the full age range of children in school; each of^the 
categories in Collef»s model --self report, protective, behavior c 
trace . and' direct observation— should be represented; they should , 
have been designed with the so-called. "normal" population in mind 
" rather than a psychopathological population; they haye enough 

information accompanying them to enable investigators to use tnem 
effectively; and they should reflect a variety of means of Pre- 
sentation (e.g,, pictorial items, semantic differential) ._ The ^ 
instruments despribed are: Work Posting; The Children s Self- 
Social constructs Test; The Children' s Sel|-Concept Index; 
Responsible Self-Concept Test;. Beh? /ior Rating Form; Cqopersmith 
Self-Esteen Inventory; Tei?nessee Self-Concept Scale; How I See ^ 
Myself Scale (Primary and Secondary Form) ;' A Semantic Differentia.! 
for Measurement of Global and Specific Self-Concepts ;, The Piers- 
Harris Ch'ildren's Self-Concept Scale (The Way I Feel About ^My>elf).^ 
Michigan State General Self -Concept.j)X:AbilitY^Miah-igan:-&t-«^^ 
__--Self-G<>ncept-^"Abi3rity-Tn Specific Subjects Scales ; and_^Selt 



Esteem Measure for Neighborhood Youth Corps Enrolees. ' (31 pages}. 
ED 083 318. Fn-rn, Pnmrin, >roe4-e fnr Kdncationally Disadvantaged ^ 

Adults. 1973. 

Sixtv-five instruments, published, between 1925 and 197.2, are_ , 
described in this annotated bibliography. The devices are intended 
for adults who have received only an elementary education, and 
adults who have completeA/high school but v/hose education was v. . 
impaired due to learning disabilities or other educational handi- 
caps Both achievement and aptitude measures are included , covering 
such" areas as intelligence, ability., learning skills, non-verbal . 
reasoning, vocabulary, reading, and- mathematics. . The Spanish 
. ■ editions of several tests in, English as a second language are . 
presented. The publisher'^ name and address is provided for eacn 
. instrument. (12. pages.) . . . . 

. ED 083 .319. Rosen, Pamela, ed. Self-Concept Mg-asures'; Grade 7 and 
Above . 1973. ;.r- 
• This 34-item annotated test bibliog'raphy deals with a- var^-ety^of , 

currently available measur.es of self-concept and self -estee^u, For 
' -the purposes of this listing, self-concept was defined as a multi- 
dimensional construct en'compassing the range or an individual s 
"perceptions and evaluations^of himself .., Many of the davices 
.• contained herein emphasize the learner's self-concept or the^ 
ind-ividual's conceptions of himself in the school environment. : 
However, several global, measures are also described. Various . 
methods for assessing self-concept, including direct observations, 
behavior ratings, self-reports, and projective techniques, are 
presented-. The instruments described' in this listing are appro- 
priate for use in grade seven and above . Information was obtained. 
—"' from the holdings and. references of the Educational Testi^ng^ Service. 
Test Collection. (7. pages). 

O ■'•*■., ■• ■ i ' ,■ ■ ' ->•...,;•'.■... 

^ ; I-.. ■ 32.' }.; : . ^ ■ .. 



ED 083 32Q, Roaen, Pamelc^, ed. M easures dt Se lf -Concept . Grades 
4~6y ia73. . * ~ ■ ^- ^ ■ — — 

This 31-item test bibliography deals with a variety of currently 
available measures of self -concept and self-esteem, Fdi>the 
. . purposes of this listing, self-concept was defined as k multi- • ^ 
dimensional construct encompassing the range of an individual's, 
perceptions and^ evaluations of |iimkelf,. Many of the devices. \ ' 
contained herein emphasize the learner's self-concept o^ the\ ^ .> 
.child s conception of himself in the school environment! . However / 
several global measures are also described. Various methods ^for 
assessing self rgoncept, including direct observations, behavior 
ratings, self -imports, and projective techniques, are presented. 

^^'''''■'^^'^^i" listing are Appropriate for use 

•with children m grades four through. six. Information w^s obtained 
. trom the holdings and ref erences-<3f-the Educational Te^tiftq 
Service_^Test_i:u^l-lectxonT'"'(6 pages). ' . •. ■ V 



ED;083 sir. ^osen, Pamela, ed. Attitudes Toward School and ichraol 

Adjus ^Jnent Grades 4-6 . 197"^.^ ~~ ' — ^- — T" 

. ~~ ^ ^ ■■. ■ ■ \ ■ 

This 31-item test bibliography lists currently available mcfeasures' 
■■ °5^t^^i^^^^s. toward school and school adjustment. The construct— ' 
attitudes toward school—encompasses pupils' attitudes "toward 
tnemselves as learners, learhing as a process, the School $.viron- 
ment or classroom situation, specific school subjects, and teachers, 
In addition,, the pupils'' behavior is considered if it is ' inSicative 
of their adjustment or lack of adjustment to the educa-f-ional 
environment. Teacher ratings, self-report devices, and observation 
techniques are the various methods for assessing, these attiliudinal . 
elements which have been included, in the listing, Instrumeilts 
described in this bibliography are appropriate for use with istu- 
dents in grades foUr throUgh'six. information v^^as obtained 'from < 
the holdings and references of the Educational ifesting Service 
Test Collection. (8 pages). ' ' (._: 

ED 083-322. Rosen, Pamela, ed . Assessment of Teach ers. 1973. 

This 53-item test b'iblibgraphy "lists a H^-ariety of ciir^eri-tly > 
- available , measures whieh lAay bemused to assess teachers/. 'Ainong 

the devices described are: instruments which are completed by..* 
. teachers and which provide "an indication of their prof icien'iy 'in ' 
or knowledge of both general and specific areas in educatioh; 
self report attitudinal-. measures for teabhers; ihscruments Uich 
are .completed by students 'a-nd which may indicate their attitudes, 
toward and/or evaluations of a. particular teacher or classroom 
• istuation^which is dependent upon the -ceachel:; and observational 
devices that may be used to consider, such factors as the teacher's 
competency, teaching style, characteristics apd/or interaction 
with.pupils.. In-formatipn was obtained from the holdings and 
references of the Educational Testing Service Test Collection, 
(ti pages) . 



3.^ 'J — 



J-9 



ED. 083 323. Rosen, Pamela, ed. A ttitudes Toward Sch ool, and School 
Adjustment; Grades 7-12 . 1973"i : ~ — 

■■ . ■ . ■ ■ ■ ' ^ . . : . ' ■ 

This 53-item test bibliography lists currently available measures 
of attitudes, toward school and school adjustment.' The construct— 
, attitudes towaqrd. schopl— rencompasse^ pupils' attitudes toward 

themselves as learners , learning as a process, the school environ- 
^j.ment or classroom situation, specific school subject, and teachers. 
'In addition, the pupils' behavior is considered if^ it is indica- 
tive, of their adjustment or lack of adjustment tp the educational 
environment.. Teacher ratings, self-report devices , and observational 
. techniquesjare the various , methods for assessing these attitudinal 
elements^which have been included in theriisting. Instruments 
described m this bibliography .are appropriate for use with stu- 
dents^ in grddes- seven thrbugK twelv-e. Information X^as obtained 
from the holdings and r.eferen9^s 6f the Educational Testing Service 
Test Collection. \7 pages). ; 

086 737". Rosen, Pamela. Sel f -Concept Mea sures . Head StaVt . " ' 
■ Test Collection . 1973. " ', ~^ ■ — ~, 

■ I . • 1 ... 

. ■ ■.« ■..-. ^ ■ ..j^ 

Forty-four items published, between 1963 and 1972 kre listed in 

this annotated bibliography which deals with a variety of self- 

concept measures appropr^.ate for use with children from the 
pr,eschopl level through the third grade. For the purposes of 
this lis-txng,, self-concept was defirfed as a- muitdrdimensional ^ ' 
construct encompassing the range of a child 's percept'i^ns and 
evaluations of- hir-self.-. Many 9f the sources!- emphasize Sshe 
learner s self-concept oc the; cljild ' s conception of himsdif in 
the school environment. - However , several global measures "are ^. 
also (inscribed.,, , (8 pages). . . * 

ED 099 427. Knapp, Joan. A Collection of Criteri on-Ref erennt^d 
Tests. TM Report No. 31 7 1974. ~ — '■ — — — 

" . ' . ■ . "... ■ '• . ' i ■ 

■Twenty-one criterion-referenced tests aire cited and for eac- t^e 
following information is provided:, description, form-at and 
. administration, response mo4*--and -scoring , technical information, ^ 
^ and references. The- tests cited are the result of an attempt 

made to hrmg together tests designated in the Educational Testing. 
Service Test Collection, a library of tests and test related 

^"Sf^^^i?"' i^^^i^J.i".^^^ sy^^r criterion-^r^'fei^enced 

teats. ih<.s annotated bibliography, does riot list every tesf ' 
- that has been labeled criterion-ref eijenced; however, it typifies / 
the variety of tests that are available under the ' rubric 
criterion-referenced. Also, criteri'on-ref erenced and norm- 
referenc-ed tests are defined in several ways ,. and their advantages, 
limitations, and uses are beiefly explored. (13 pages) 



APPENDIX B 

- / SELECTED LIST OF TEST PUBLISHERS , ^ ^ ' . A 

AMERICAN COLLEGE TESTING PROGRAM, P.O. Box 168, Iowa City, Iowa .5?240, 

AMERICAN GUIDANCE SERVICE, INC., Publishers' Biiil-ding, Circle^Pirtes. 

Minnesota 55014 , O . , 

■ . "r ■ ■ .• ■ 

AUSTRALIAN COUNCIL FOR: EDUCATIONAL RESEARCH, Frederick Street, . , •• 
Hawthorn E. 2 , 'Victoria , Australia ' - ' _ , 

BOBBS-MERRILL COMPANY^ INC., 4300 West 62Ad Street, Indianapolis, 
Indiana 46268 . - - . ' , 

BUREAU OF EDUCATIONAL RESEARCH AND SERV^ICE, Un ■ versity of Iowa, 
Iowa City, iowa ,52240 ; 

CALIFORNIA TEST BUREAU /MCGIU^W-HILL >..E)^#1 Mon^e Research Park, _ 
Monterey, California 93940 ' - 

COMMITTEE ON DIAGNOSTIC READING TESTS , INC . , Mountain Home, , 
North, Carolina 28758 • - , • , 

CONSULTING PSYCHOLOGISTS PRESS,- INC., 577 College Aven^le , Palo Alto, 
California .9430^ , 

COOPERATIVE TESTS AND SERVICES, Educational Testing Service, 
Pxinceton, New Jersey' 08^40 ■ . . - • 

EDUCATIONAL AND INDUSTRIAL TESTING SERVICE , P.O. "Box 7234,. San Die^o, 
California 92107 , < 

.■ • .- ■' . ' ' ' ■ ' ' 

EDUCATIONAL TEST BUREAU, Division of Amertqari Guidance Service,^ ' . 
Inc., 720 Eashin^ton Avenue, S.E., Minneapolis, Minnesota 55414 . 

EDUCATIONAL TESTING . SERVICE , Princeton,. New Jersey 08540 

GUIDANCE CENTRE, Ontario College of Education, University of 
Toronto, 1000 Yonge Street, Toronto 2 89, Qntario,. Canada ' 

■ HARCOURT BRACE JOVANOVICH / INC .", 7 5' Third Avenue; New York, 
New York 10017 ■ ' 

HOUGHTON MIFFLIN COMPANY, 110, Trembnt Street, Boston, Massac^^^ 

■ 021-07 ■ • ^ • ■ ■ ' ■ ... . ; ;. ■• 

INSTITUTE FOR PERSONALITY AND ABlLiTY ' TESTING , 1602 COronado jDr iVe.,.^ 

. Champaign, Illinois 61822 ..:„.. . ' ' 

LYONS AND CARl^AHAN, 407 East 2 5th Street,. '.Chicago, Illinois 60616 .. 



'^KRSONNEL PRESS, , INC., 20 Nassau Street, Princeton, New Jersey 01 

■THE PSYCHOLOGICAL CORPORATION,' 304 East 45th Street, New York, 
New York 10017 * , 

PSYCHOMETRIC 'AFFILIATES, Box 31167, MunSter, Indiana 46321 _ 

PUBLIC -PERSONNEL ASSOCIATION , 1313 East 60th Street, Chicago, 
Illinois 60637 

. SCHOLASTIC. TESTING SERVICE , INC . , 480 Meyer Road, Bensenyille, 
Illinois 60106 / . 

SCIENCE RESEARCH ASSOCIATES, INC., 259 East Erie Street, Chicago, 
Illinois 60611 . 

STANFORD UNIVERSITY PRESS, Stanford, California 94305 

STOELTING COMPANY, 424 North Homan Avenue, Chicago, Illinois 60€ 

TEACHERS COLLEGE PRESS , TeacheJTS College: Columbia University, 
New York, New York 10027 



APPENDIX C 



MULTIPLE CRITERION MEASURES 



A, , . Indicators of Status or Change in- Cognitive and Affi£:Ctive 

' Behaviors of students in Terms of Standardized Measures 
and Scales , ^ 

1. Standardized achievement and ability tests, the scores 

on which allow inferences to be made: regarding the extent 
to which cognitive objectives concerned with knowledge, 
comprehension/ understanding, .skills and applicati ons 
have been ^attained. . . 

4^ 2. Standardized self -inventories designed to yield measures 
of adjustment, appreci a/^ipns , attitudes, interests, and 
temperarpient from which inferences can bo formulated 
concerning the possession of psychological traits (such 
as def ensiveness , rigidity , aggressiveness , cooperat?.vene 
hostility, and anxiety) . 

3. Standardized rating scales and checklists for judging the 
quality of products in visual artsi, crafts, shop activitic 
penmanship, letter-writing, fashion design, am?, other 
' activities,. * . 

B. Indicators of Status or Change in Cognitive and Affective 
Behctviors of Students by Informal or Semiformal Teacher-inade 
Instruments or Devices , 

1. Intervrews: frequencies and measurable levels of res- 
ponses to formal and informal questions raised in a 
face-to-face interrogation. ^. 

2. Questionnaires: frequencies of responses to items in an 
objective format and numbers of responses to categorized 
dimensions developed from the content analysis of res-' 
ponses to open-ended questions ^ 

3. ' Self -concept perceptions:, measures of current status and 

indices of congruence between real self and ideal self 
often determined from use of the semantic differencial or 
0-sort, techniques . 



Metfessel, Newton S. iv Michael, William B '^A Paradigm Involving 
Multiple Criterion Measures, for the Evaluation of Effectiveness o 
,School Programs", "Ed ucational , & Psycho logical Measurement", 1967 
p. 27, 931-943. " 



4 Self -evaluation measures: s+-udent's own reports on his 
* perceived or desired level of achievement r on his per- 
ceptions of his personal and social adjustment, and on 
his future academic and vocational plans. 

5; Teacher-devised projective devices such as casting 

characters , in the class play , role playing, and picture 
interpretation based on an informal scoring model that 
usually embodies the determination of frequencies or the 
occurrence of specific behaviors, or ratings of their,, 
intensity or quality,. .. ■. ' 

6. Teacher-made achievement tests, (objective and essay) , the 
scores on which allow inferences regarding the extent to 
which specific instructional objectives have been attained. 

7 Teacher-made rating sc,a:le's and check lists for observation 
of classroom behaviors; performance levels of speech, mufsic 
and art; manifestation of creative endeavors personal and 
social adjustment, physical wellbeing. 

8. Teacher-modifie'd forms (preferably with consultant aid) 
of the semantic differential scale. 

C, Indicators of Status or Change in Student Behavior Other Than 
Those Meas ured by Tests, Inventories, and Ob servation -Scales 
in Relation to the Task of Evaluating Objectives of School 
Programs . 

1. Absences: full-day, half-day, part-day, and other selective 
indices "pertaining to frequency and duration of lack of 
attendance. 

• \. 

2. Anecdotal records: critical inciden .3 noted including 

' ' frequencies of behaviors judged to be highly undesirable 
or highly deserving of commendation. 

3. Appointments: frequencies with which they are kept or 
broken . ' 

4 Articles, and stories: numbers and types publishc^d in school 
newspapers, magazines, journals, or proceedings of student 
organizations. ■ 

5. Assignments: numbers and types completed with so.Tie sort 
or quality rating or mark attached. 

6. Attendance': frequency and duration when attendan-e is 
■' requited or considered optional (as in club inee tings, 

specia.l events, or of f -ca-mpus act:iviti es) . 



AutobiographiGal data;, , behaviors reported that could 
be classified and subsequently assigned judgmental 
values concerning their appropriateness relative to 
specific objectivdss concerned' with human development. 

Awards, citations, honors, and related indicators of • 
distinctive or creative perfoirmance; freiquency of 
occurrence or judgments of merit in terms of sealed 
values. 

Books :^ numbers checTiced out of library, numbers, renewed , 
numbers reported read when reading is required yor when 
voluntary. 

Case histories: critical incidents and other passages 
reflecting quantifiable categories of behavior. 

Changes in program or in teacher as requested by student: 
frequency or occurrence. 

Choices expressed or carried out: vocational, avocatibnal, 
and educational (especially in relation to their judged 
appropriateness to known physical , intellectual , emotional', 
socia^, aesthetic, interest, and other factors.) 

Citations: cbmmehdatory in both formal and informal media 
of communication such as in the newspaper, television, 
school assembly, classroom, bulletin board, or elsewhere 
(see Awards) . . , . 

"Contract" : frequency or duration of direct or indirect 
communications between persons observed and one, or more 
significant othei!'s with specific refercnpe to increase or 
decrease in frequency or to. duration relative to selected 
time intervals. , c 

Disciplinary actions taken: frequency and type .• 

Dropouts: numbers of students leaving school before 
completion of program of stujdies. 

Elected popi.tions: . numbers and ty4pe's' held in class, 
student bocy, or out -of -school social groups. 

Extracurricular activities: frequency or- duration of 
participation in observable behaviors amenable to class- 
ification such as taking part in athletic events, charity 
drives, cultural . activities , and numerous service-related 
avocational endeavor.^. i . 

-Grade placerpent : the succes^J or lack of success in , being 
prc:noted or - retained? number of times accelerated or 
skipped * . ; 



20. Grade point average : including numbers of recoimhended ^ 
units of course' work in academic as well as in non- 
college preparatory programs. 

21. Grouping: frequency and/or duration of^moves from one 
instructional group to another within a given class grade. 

22.. Homework assignments: punctuality of completion, quanti- 
fiable judgments of quality such as class marks. 

23. Leisure activities: numbers and. types of ? times spent 
in; awards and prizes received in participation. 

24. Library card: possessed or not possessed; renewed or 

not renewed. * ^ . ^ 

25. Load: numbers, of units or courses carried by students. 

26. Peer group participation: ' frequency and duration of 
activity in what are judged to be . socially accepi-.able 
and socially undesirable behaviors. 

27. Performance :* awards , citations received; extra credit . 
assignments and associated points earned; numbers of books 
or othor learning materials taken 9Ut of the l^ibrary, 
products exhibited at competitive events . 

28. Rebommendations : numbers of and judged levels of 
f avorableness ; . . , 

29. Recidivism by students: incidents (presence ' or absence 

. or frequency of occurrence) of a given student ' s returning 
to a probationary status, to a detention facility, ^^^^ 
observable behavior patterns judged to be socially undes-- 
irable (intoxicated state, dope addiction , hostile acts 
including arrests, sexual, deviation) . 

30. Referrals: by teacher to counselor , psychologist , or 
administrator for disciplinary action, for special ai*^in 
overcoming learning difficulties, for behavior disorders, 
for health defects or, for part-time^mploynient activities 

31. Referrals: by student hjgmself (presence, absence, or 
frequency) . ^ 

' 32. Service points: numbers earned. 

33. Skills: demonstrat:ion of new of -increased competen^ ie;5 
such as tl>ose found in physical education , crafts , 
homemaking, and the arts that are not measured in a 
highly-valid fashion -by available tests and scales > 




34. Social mobility; numbers of times student has moved from 
^' ' one neighborhood to another and/or frequency with which 

parents have changed jobs, 

35. Tape recordings: critical incidents contained and otherx 
analyzable events amenable to classification and 
ertumeration/ '^^ • 

36. Tardiness:- frequency of. 

37. Transiency: incidents. of . ^ ■ ; " 

38. Transfers': numbers of students entering school from 
another school (horizontal move). 

39... Withdrawal ;v numbers of students withdrawing from school 
or ^from a "special program, (see Dropouts) > 

Indicators of Status oy Change in Cognitive and Affective 
Behaviors of Teachers and Other School Personnel in Relation 
to the Evaluation of School Programs , ] 

1. Articles; frequency and types of articles and written 
documents prepared by teachers for publication or dis- 
tribution. 

2. Attendance: frequency pf , at professional meetings or 
at^inservice trainir)ig programs , institutes / summer r 
schools, colleges and universities (for advanced training) 
from which inferences ca^ be drawn regarding the pro- 
fessional persbn's desire to improve his competence. 

3. Elective offices: numbers and types of appointments held 
^. in prof essional and social organizations . 

4. Grade point average; earned in postgraduate^ courses 

< . ' ' ' * ■ 

5. Load carried by teacher: teacher-pupil or counselor- 
pupil ration. . 

6. Mail: frequency of positive ancj negative statements in 
written correspondence about teachers, counselors, 
administrators, and other personnel. . 

7. Memberships including elective positions held in pro- 
fessional and community organizations: frequency and 
duration of association. 

8.. Model congruence index: determination of how well the 

actions of professional personnel in a progr^ approximate 
. certain operationally-stated judgmental criterria concerning 
the qualities of a meritorious program. 



9. Moonlighting; frequency of .outside jobs and time spent 

in these activities by teachers or other school personnel. 

10 . Nominations by peers, students, administrators or parents 
for outstanding servi<:e and/or professional competencies:, 
frequency, of . 

11. Rating scales and checklists (e.g., gtaphic, rating scales 
or the semantic differential) of operationally-stated 
dimensions of teachers* behaviors in the school setting 
from which observers may formulate inferences Regarding 

•. . changes of behavior that reflect what are judged to be :. 
desirable 'gains in professional competence, skills, . 
attitudes, ad justment , interests and work efficiency;- 
' ' the perceptions of various members of the total school 
community (parents, teachers, administrators, counselors, 
students, and classified employees) of the behaviors of 
other members may also be obtained and- compared. 

12. > Records and reporting procedures practiced by admin- 

istrators, couiiselors, and teachers: judgments of 
adequacy by outside consultants. 

13. Termination.: frequency of voluntary or involuntary 
resignation or dismissals of school personnel. 

• , * • 

41. Transfers:., frequency of requests of . teachers to move from 
one school to another. 

Indicators of Community Behaviors in Relation to the / * • 

Evaluation of School Programs . 

1. Alumni participation: numbers of visitations, extent of. 
• involvement in PTA activities, amount of ; support of a 

tangible (financial) or a service nature to a continuing 
scHool program or activf ty . 

2. 7\ttendance at special school events, at meeting of the 
board of education, or at other. group activities by ^ 
parents:' frequency of.. ^ 

3. Conference of parent-teacher, parent-counselor, parent- 
administrator sought' by parents: frequency cr request. 

' . 

4. Conferences of- the same type sought and initiated by 
school personnel: frequency of requests and record of 
appointments kept by parents. 

5- Interview responses amenable to classification and 
quantification. . 



6^ Letters (jnaill; frequency of requests for information^ 
materials^ and servicing* 

7.. Letters: frequency of praiseworthy or ciritical ooiranents 
' about school programs and services and about personnel 
participating in them. 

8, Participant analysis of aliimni; determination of locale 
of graduates, occupation, affiliation with particular 
institutions, or outside agencies. 

9. Parental responsie to letters and report cards upon 
written or oral request' by schoor personnel; frequency; 
of compliance by parents. 

10. Telephone call^ from parents , alumni , and from personnel 
in communications media (e.g., newspaper reports) : 
frequency, duration/ and quantifiablef judgments about 
staitements monitored from telephone conversations. \ 

11. Transportation requests: frequency of. ' ' ^ 

. * ■ 



■; ■ / 



8;; .J 




